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Preface 


Traditionally  physics  is  divided  into  two  fields  of  activities:  theoretical  and  experi¬ 
mental.  As  a  consequence  of  the  stunning  increase  in  computer  power  and  of  the 
development  of  more  powerful  numerical  techniques,  a  new  branch  of  physics 
was  established  over  the  last  decades:  Computational  Physics.  This  new  branch 
was  introduced  as  a  spin-off  of  what  nowadays  is  commonly  called  computer 
simulations.  They  play  an  increasingly  important  role  in  physics  and  in  related 
sciences  as  well  as  in  industrial  applications  and  serve  two  purposes,  namely: 

•  Direct  simulation  of  physical  processes  such  as 

o  Molecular  dynamics  or 
o  Monte  Carlo  simulation  of  physical  processes 

•  Solution  of  complex  mathematical  problems  such  as 

o  Differential  equations 
o  Minimization  problems 
o  High-dimensional  integrals  or  sums 

This  book  addresses  all  these  scenarios  on  a  very  basic  level.  It  is  addressed 
to  lecturers  who  will  have  to  teach  a  basic  course/basic  courses  in  Computational 
Physics  or  numerical  methods  and  to  students  as  a  companion  in  their  first  steps  into 
the  realm  of  this  fascinating  field  of  modern  research.  Following  these  intentions 
this  book  was  divided  into  two  parts.  Part  I  deals  with  deterministic  methods  in 
Computational  Physics.  We  discuss,  in  particular,  numerical  differentiation  and 
integration,  the  treatment  of  ordinary  differential  equations,  and  we  present  some 
notes  on  the  numerics  of  partial  differential  equations.  Each  section  within  this  part 
of  the  book  is  complemented  by  numerous  applications.  Part  II  of  this  book  provides 
an  introduction  to  stochastic  methods  in  Computational  Physics.  In  particular,  we 
will  examine  how  to  generate  random  numbers  following  a  given  distribution, 
summarize  the  basics  of  stochastics  in  order  to  establish  the  necessary  background 
to  understand  techniques  like  MARKOV-Chain  Monte  Carlo.  Finally,  algorithms  of 
stochastic  optimization  are  discussed.  Again,  numerous  examples  out  of  physics  like 


v 
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diffusion  processes  or  the  POTTS  model  are  investigated  exhaustively.  Finally,  this 
book  contains  an  appendix  that  augments  the  main  parts  of  the  book  with  a  detailed 
discussion  of  supplementary  topics. 

This  book  is  not  meant  to  be  just  a  collection  of  algorithms  which  can 
immediately  be  applied  to  various  problems  which  may  arise  in  Computational 
Physics.  On  the  contrary,  the  scope  of  this  book  is  to  provide  the  reader  with  a 
mathematically  well-founded  glance  behind  the  scene  of  Computational  Physics. 
Thus,  particular  emphasis  is  on  a  clear  analysis  of  the  various  topics  and  to  even 
provide  in  some  cases  the  necessary  means  to  understand  the  very  background 
of  these  methods.  Although  there  is  a  barely  comprehensible  amount  of  excellent 
literature  on  Computational  Physics,  most  of  these  books  seem  to  concentrate  either 
on  deterministic  methods  or  on  stochastic  methods.  It  is  not  our  goal  to  compete  with 
these  rather  specific  works.  On  the  contrary,  it  is  the  particular  focus  of  this  book  to 
discuss  deterministic  methods  on  par  with  stochastic  methods  and  to  motivate  these 
methods  by  concrete  examples  out  of  physics  and/or  engineering. 

Nevertheless,  a  certain  overlap  with  existing  literature  was  unavoidable  and  we 
apologize  if  we  were  not  able  to  cite  appropriately  all  existing  works  which  are  of 
importance  and  which  influenced  this  book.  However,  we  believe  that  by  putting  the 
emphasis  on  an  exact  mathematical  analysis  of  both,  deterministic  and  stochastic 
methods,  we  created  a  stimulating  presentation  of  the  basic  concepts  applied  in 
Computational  Physics. 

If  we  assume  two  basic  courses  in  Computational  Physics  to  be  part  of  the  cur¬ 
riculum,  nicknamed  here  Computational  Physics  101  and  Computational  Physics 
102 ,  then  we  would  like  to  suggest  to  present/study  the  various  topics  of  this  book 
according  to  the  following  syllabus: 

•  Computational  Physics  101: 

-  Chapter  1:  Some  Basic  Remarks 

-  Chapter  2:  Numerical  Differentiation 

-  Chapter  3:  Numerical  Integration 

-  Chapter  4:  The  Kepler  Problem 

-  Chapter  5:  Ordinary  Differential  Equations:  Initial  Value  Problems 

-  Chapter  6:  The  Double  Pendulum 

-  Chapter  7 :  Molecular  Dynamics 

-  Chapter  8:  Numerics  of  Ordinary  Differential  Equations:  Boundary  Value 
Problems 

-  Chapter  9:  The  One-Dimensional  Stationary  Heat  Equation 

-  Chapter  10:  The  One-Dimensional  Stationary  SCHRODINGER  Equation 

-  Chapter  12:  Pseudo-random  Number  Generators 

•  Computational  Physics  102: 

-  Chapter  1 1 :  Partial  Differential  Equations 

-  Chapter  13:  Random  Sampling  Methods 

-  Chapter  14:  A  Brief  Introduction  to  Monte  Carlo  Methods 

-  Chapter  15:  The  Ising  Model 


-  Chapter  16:  Some  Basics  of  Stochastic  Processes 

-  Chapter  17:  The  Random  Walk  and  Diffusion  Theory 

-  Chapter  18:  MARKOV-Chain  Monte  Carlo  and  the  Potts  Model 

-  Chapter  19:  Data  Analysis 

-  Chapter  20:  Stochastic  Optimization 

The  various  chapters  are  augmented  by  problems  of  medium  complexity  which 
help  to  understand  better  the  numerical  part  of  the  topics  discussed  within  this  book. 

Although  the  manuscript  has  been  carefully  checked  several  times,  we  cannot 
exclude  that  some  errors  escaped  our  scrutiny.  We  apologize  in  advance  and  would 
highly  appreciate  reports  of  potential  mistakes  or  typos. 

Throughout  the  book  Si-units  are  used  except  stated  otherwise. 


Graz,  Austria 
July  2015 


Benjamin  A.  Stickler 
Ewald  Schachinger 
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Chapter  1 

Some  Basic  Remarks 


1.1  Motivation 

Computational  Physics  aims  at  solving  physical  problems  by  means  of  numerical 
methods  developed  in  the  field  of  numerical  analysis  [1,2].  According  to  I.  Jacques 
and  C.  Judd  [3],  it  is  defined  as: 

Numerical  analysis  is  concerned  with  the  development  and  analysis  of  methods  for  the 

numerical  solution  of  practical  problems. 

Although  the  term  practical  problems  remained  unspecified  in  this  definition,  it 
is  certainly  necessary  to  reflect  on  ways  to  find  approximate  solutions  to  complex 
problems  which  occur  regularly  in  natural  sciences.  In  fact,  in  most  cases  it  is  not 
possible  to  find  analytic  solutions  and  one  must  rely  on  good  approximations.  Let 
us  give  some  examples. 

Consider  the  definite  integral 


b 

dxexp(— x2)  ,  (1.1) 

which,  for  instance,  may  occur  when  it  is  required  to  calculate  the  probability  that 
an  event  following  a  normal  distribution  takes  on  a  value  within  the  interval  [a,  b\, 
where  a,  b  e  M.  In  contrast  to  the  much  simpler  integral 

b 

dxexp  (v)  =  exp  (b)  —  exp  (a)  ,  (1.2) 

the  integral  (1.1)  cannot  be  solved  analytically  because  there  is  no  elementary 
function  which  differentiates  to  exp  (—  v2).  Hence,  we  have  to  approximate  this 
integral  in  such  a  way  that  the  approximation  is  accurate  enough  for  our  purpose. 
This  example  illustrates  that  even  mathematical  expressions  which  appear  quite 
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simple  at  first  glance  may  need  a  closer  inspection  when  a  numerical  estimate 
for  the  expression  is  required.  In  fact,  most  numerical  methods  we  will  encounter 
within  this  book  have  been  designed  before  the  invention  of  modern  computers  or 
calculators.  However,  the  applicability  of  these  methods  has  increased  and  is  still 
increasing  drastically  with  the  development  of  even  more  powerful  machines.  We 
give  another  example,  namely  the  oscillation  of  a  pendulum.  We  know  from  basic 
mechanics  [4-8]  that  the  time  evolution  of  a  frictionless  pendulum  of  mass  m  and 
length  l  in  a  gravitational  field  is  modeled  by  the  differential  equation 

6  +  j  sin  (0)  =  0  .  (1.3) 

The  solution  of  this  equation  describes  the  oscillatory  motion  of  the  pendulum 
around  the  origin  O  within  a  two-dimensional  plane  (Fig.  1.1).  Here  0  is  the  angular 
displacement  and  g  is  the  acceleration  due  to  gravity.  Furthermore,  a  common 
situation  is  described  by  initial  conditions  of  the  form: 


j  0(0)  =  £o  , 

(  0(0)  =  0  . 


(1.4) 


For  small  initial  angular  displacements,  Oo  1,  we  set  in  Eq.  (1.3)  sin  (0)  %  0 
and  obtain  the  differential  equation  of  the  harmonic  oscillator: 

0  +  jd  =  O.  (1.5) 

Together  with  the  initial  conditions  (1.4)  we  arrive  at  the  solution 


Fig.  1.1  Schematic 
illustration  of  the  pendulum 
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with  co  —  yj g/i .  The  period  x  of  the  pendulum  follows  immediately: 


However,  if  the  approximation  of  a  small  angular  displacement  0o  1  is  not 
applicable,  expressions  (1.6)  and  (1.7)  will  not  be  valid.  Thus,  it  is  advisable  to 
apply  energy  conservation  in  order  to  arrive  at  analytic  results.  The  total  energy  of 
the  pendulum  is  given  by: 


1  9  1  9 

E  —  -mv2  +  mgl  [1  —  cos  (0)]  =  -mv2}  +  mgl  [1  —  cos  (0q)]  •  (1.8) 

Here  v  is  the  velocity  of  the  point  mass  m  and  i?o  and  0o  are  defined  by  the  initial 
conditions  (1.4).  Since  0(0)  =  0  we  have 


E  —  mgl  [1  —  cos  (6>0)] 

=  2mgl  sin2  (y )  ,  (1.9) 

where  we  made  use  of  the  relation:  1  —  cos(v)  =  2  sin2(v/2).  We  use  this  result  in 
Eq.  (1.8)  and  arrive  at: 


Since  v 1  —  l282  we  have 


Separation  of  variables  yields 


(1.10) 


(1.11) 


(1.12) 


with  k  —  sin  (6*o /  2) .  For  t  =  r  we  have  9  =  9q  and  we  obtain  for  the  period 
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Let  us  transform  the  above  integral  into  a  more  convenient  form  with  help  of 
the  substitution  ksin(a')  =  sin(^/2).  Thus,  a  e  [0,  tt/2]  and  a  straightforward 
calculation  yields: 


^2  da 

o  yj\  —  k2  sin2  (a) 

\i 

=  4J-Ki(k).  (1.14) 

V g 

The  function  K\(k)  introduced  in  (1.14)  for  k  e  R  is  referred  to  as  the  complete 
elliptic  integral  of  the  first  kind  [9-12].  All  these  manipulations  did  not  really  result 
in  a  simplification  of  the  problem  at  hand  because  we  are  still  confronted  with 
the  integral  in  Eq.  (1.14)  which  cannot  be  evaluated  without  the  use  of  additional 
approximations  which  will,  in  the  end,  result  in  a  numerical  solution  of  the  problem. 
A  natural  way  to  proceed  would  be  to  expand  the  complete  elliptic  integral  in  a 
power  series  up  to  order  N ,  where  N  is  chosen  in  such  a  way  that  the  truncation 
error  becomes  negligible.  We  can  find  the  desired  expression  in  any  text  on 

special  functions  [9,  11,  12].  It  reads 


(*)  =  f£ 


N 

7 X  \  -v 

2  ^ 

n= 0 


(2  n)\ 


i2 


_2  2n(nl)2_ 


k  n  +  R]\f(k) 


(1.15) 


Imagine  now  the  inverse  problem:  the  period  r  is  given  and  the  initial  angle  6o 
is  unknown.  Again,  we  could  expand  the  integrand  in  a  power  series  and  solve 
the  corresponding  polynomial  for  Qq.  However,  such  an  approach  would  be  very 
inefficient  due  to  two  reasons:  first  of  all,  we  are  confronted  with  the  impossibility 
of  finding  analytically  the  roots  of  a  polynomial  of  order  N  >  41  and,  secondly,  at 
which  value  of  N  should  we  truncate  the  power  series  if  Oo  is  unknown?  A  glance  in 
a  book  on  special  functions  might  give  us  a  better,  i.e.  more  convenient,  alternative. 
Indeed,  the  inverse  function  of  the  elliptic  integral  K\  ( k )  with  respect  to  k  can  be 
given  explicitly  in  terms  of  JACOBI  elliptic  functions  [9-12].  Series  expansions  of 
these  functions  have  been  developed  such  that  we  can  approximate  Oo  by  truncating 
the  respective  series. 

This  example  helped  to  illustrate  that  we  depend  on  numerical  approximations 
of  definite  expressions  in  a  multitude  of  cases.  Even  if  an  numerically  approximate 
solution  has  been  found  for  a  particular  problem  it  will  be  adamant  to  check  quite 


!The  roots  of  a  real  valued  polynomial  of  order  N  =  3  or  4  are  referred  to  as  Cardano’s  or 
Ferrari’s  solutions  [13],  respectively. 
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carefully  if  the  approach  was  (i)  justified  within  the  required  accuracy,  and  (ii)  if  it 
allowed  to  improve  the  induced  error  of  the  result.  The  second  point  is  known  as  the 
stability  of  a  routine.  We  will  discuss  this  topic  in  more  detail  in  Sect.  1.4. 

Throughout  this  book  we  will  be  confronted  with  numerous  methods  which  will 
allow  approximate  solutions  of  problems  similar  to  the  two  examples  illustrated 
above.  First  of  all,  we  would  like  to  specify  the  properties  we  expect  these  methods 
to  have.  Primarily,  the  method  is  to  be  formulated  as  an  unambiguous  mathematical 
recipe  which  can  be  applied  to  the  set  of  problems  it  was  designed  for.  Its 
applicability  should  be  well  defined  and  it  should  allow  to  determine  an  estimate  for 
the  error.  Moreover,  infinite  repetition  of  the  procedure  should  approximate  the  exact 
result  to  arbitrary  accuracy.  In  other  words,  we  want  the  method  to  be  well  defined  in 
algorithmic  form.  Consequently,  let  us  define  an  algorithm  as  a  sequence  of  logical 
and  arithmetic  operations  ( addition ,  subtraction,  multiplication  or  division)  which 
allows  to  approximate  the  solution  of  the  problem  under  consideration  within  any 
accuracy  desired.  This  implies,  of  course,  that  numerical  errors  will  be  unavoidable. 

Let  us  classify  the  occurring  errors  based  on  the  structure  every  numerical 
routine  follows:  We  have  input-errors,  algorithmic-errors,  and  output-errors  as 
indicated  schematically  in  Fig.  1.2.  This  structural  classification  can  be  refined: 
input-errors  are  divided  into  roundoff  errors  and  measurement  errors  contained  in 
the  input  data;  algorithmic-errors  consist  of  roundoff  errors  during  evaluation  and 
of  methodological  errors  due  to  mathematical  approximations;  finally,  output  errors 
are,  in  fact,  roundoff  errors.  In  Sects.  1.2  and  1.3  we  will  concentrate  on  roundoff 
errors  and  methodological  errors.  Since  in  most  cases  measurement  errors  cannot 
be  influenced  by  the  theoretical  physicist  concerned  with  numerical  modeling,  this 
particular  part  will  not  be  discussed  in  this  book.  However,  we  will  discuss  the 
stability  of  numerical  routines,  i.e.  the  influence  of  slight  modifications  of  the  input 
parameters  on  the  outcome  of  a  particular  algorithm  in  Sect.  1.4. 


Fig.  1.2  Schematic 
classification  of  the  errors 
occurring  within  a  numerical 
procedure 


6 


1  Some  Basic  Remarks 


1.2  Roundoff  Errors 


In  fact,  since  every  number  is  stored  in  a  computer  using  a  finite  number  of  digits,  we 
have  to  truncate  every  non-terminating  number  at  some  point.  For  instance,  consider 
|  =  0.666666666666 . . .  which  will  be  stored  as  0.6666666667  if  the  machine 
allows  only  ten  digits.  Actually,  computers  use  binary  arithmetic  (for  which  even 
0. 1  io  =  0.0001 1001 1001 100 ..  .2  is  problematic2)  but  for  the  moment  we  shall 
ignore  this  fact  since  the  above  example  suffices  to  illustrate  the  crucial  point.  Let 
Fl(v)  denote  the  floating-point  form  of  a  number  v  within  the  numerical  range  of  the 
machine.  For  the  above  example,  i.e.  a  ten  digit  storage,  we  have 


FI 


=  0.6666666667 . 


(1.16) 


This  has  the  consequence  that,  for  instance,  F1(V3)  •  F1(V3)  ^  F1(V3  •  \/3)=3. 
However,  F1(V3)  •  F1(V3)  %  3  within  the  defined  range.  Before  we  continue  our 
discussion  on  roundoff  errors  we  have  to  introduce  the  concepts  of  the  absolute  and 
the  relative  error.  We  denote  the  true  value  of  a  quantity  by  y  and  its  approximate 
value  by  y.  Then  the  absolute  error  ea  is  defined  as 


*a  =  \y-y\. 


(1.17) 


while  the  relative  error  er  is  given  by 


€r 


y-y 

y 


(1.18) 


provided  that  y  ^  0.  In  most  applications,  the  relative  error  is  more  significant.  This 
is  illustrated  in  Table  1.1,  where  it  is  intuitively  obvious  that  in  the  second  case  the 
approximate  value  is  much  better  although  the  absolute  error  is  the  same  for  both 
examples. 

Let  us  have  a  look  at  the  relative  error  of  an  arbitrary  number  stored  to  the  k-th 
digit:  We  can  write  an  arbitrary  number  y  in  the  formy  =  O.d^di . . .  <4<4+ 1 ...  10” 
with  d\  ^  0  and  n  e  Z.  Accordingly,  we  write  its  approximate  value  as 
y  —  0.d\d2d^  ...dk  10”,  where  k  is  the  maximum  number  of  digits  stored  by  the 


Table  1.1  Illustration  of  the 
significance  of  the  relative 
error 


y 
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(1) 

0.1 

0.09 

0.01 

0.1 

(2) 

1000.0 

999.99 

0.01 

0.00001 

!A  disastrous  effect  of  this  binary  approximation  of  0.1  was  discussed  by  T.  Chartier  [14]. 
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machine.  Hence  we  obtain  for  the  relative  error 

0.did2d3 . . .  <4<4+ 1  . . .  10"  —  0.d\d2d2 . . .  dk  10" 

6  r  —  - 

0.d\d2d2 . . .  dfcdfc- |-i ...  1077 
0.dk+\dk+2 ...  10"  k 


0.d\d2d2 . . . 

1077 

0.<4+i<4+2  •  •  • 

10_<: 

0.d\d2d2 . . . 

<  —  10“* 

“  0.1 

=  1(T*+1  .  (1.19) 


In  the  last  steps  we  employed  that,  since  d\  ^  0,  we  have  0.d\d2d2 ...  >  0.1 
and  accordingly  0.<4+i<4+2  ...  <  1.  If  the  last  digit  would  have  been  rounded  (for 
<4+i  >  5  we  set  dk  —  dk  +  1  otherwise  dk  remains  unchanged)  instead  of  a  simple 
truncation,  the  relative  error  of  a  variable  y  would  be  6r  =  0.5  •  10_A+1 . 

Whenever  an  arithmetic  operation  is  performed,  the  errors  of  the  variables 
involved  is  transferred  to  the  result  [15].  This  can  occur  in  an  advantageous  or 
disadvantageous  way,  where  we  understand  disadvantageous  as  an  increase  in 
the  relative  error.  Particular  care  is  required  when  two  nearly  identical  numbers 
are  subtracted  ( subtractive  cancellation)  or  when  a  large  number  is  divided  by 
a,  in  comparison,  small  number.  In  such  cases  the  roundoff  error  will  increase 
dramatically.  We  note  that  it  might  be  necessary  to  avoid  such  operations  in  our 
aim  to  design  an  algorithm  which  is  required  to  produce  reasonable  results.  An 
illustrative  example  and  its  remedy  will  be  discussed  in  Sect.  1.3.  However,  before 
proceeding  to  the  next  section  we  introduce  a  lower  bound  to  the  accuracy  which  is 
achievable  with  a  non-ideal  computer,  the  machine -number.  The  machine-number 
is  smallest  positive  number  rj  which  can  be  added  to  another  number,  such  that  a 
change  in  the  result  is  observed.  In  particular, 


rj  —  min 

s 


{5  >  o 


1  +  5  >  1}  . 


(1.20) 


For  a  (nonexistent)  super-computer,  which  is  capable  of  saving  as  much  digits 
as  desired,  rj  would  be  arbitrarily  small.  A  typical  value  for  double-precision  in 
Fortran  or  C  is  rj  ^  10-16. 


1.3  Methodological  Errors 

A  methodological  error  is  introduced  into  the  routine  whenever  a  complex  mathe¬ 
matical  expression  is  replaced  by  an  approximate,  simpler  one.  We  already  came 
across  an  example  when  we  regarded  the  series  representation  of  the  elliptic 
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integral  (1.12)  in  Sect.  1.1.  Although  we  could  evaluate  the  series  up  to  an  arbitrary 
order  A,  we  are  definitely  not  able  to  sum  up  the  coefficients  to  infinite  order. 
Hence,  it  is  not  possible  to  get  rid  of  methodological  errors  whenever  we  have  to 
deal  with  expressions  we  cannot  evaluate  analytically.  Another  intriguing  example 
is  the  numerical  differentiation  of  a  given  function.  The  standard  approximation  of 
a  derivative  reads 


/'(* o)  =  — f{x) 
ax 


X=X() 


f(x  o  +  h)  -f(x q) 
h 


(1.21) 


This  approximation  is  referred  to  as  finite  difference  and  will  be  discussed  in  more 
detail  in  Chap.  2.  One  would,  in  a  first  guess,  expect  that  the  obtained  value  gets 
closer  to  the  true  value  of  the  derivative  f{xfi)  with  decreasing  values  of  h.  From  a 
calculus  point  of  view,  this  is  correct  since  by  definition 


X=XQ 


lim 

/*->  o 


f(xo  +  h)  -  f(xo ) 
h 


(1.22) 


However,  this  is  not  the  case  numerically.  In  particular,  one  can  find  a  value  h 

/V  /V 

for  which  the  relative  error  is  minimal,  while  for  values  h  <  h  and  h  >  h  the 
approximation  obtained  is  worse  in  comparison.  The  reason  is  that  for  small  values 
of  h  the  roundoff  errors  dominate  the  result  since /(vo  +  h )  and/(vo)  almost  cancel 

/V 

while  1  /h  is  very  small.  For  h  >  h,  the  methodological  error,  i.e.  the  replacement  of 
a  derivative  by  a  finite  difference,  controls  the  result. 

We  give  one  further  example  [16]  in  order  to  illustrate  the  interplay  between 
methodological  errors  and  roundoff  errors.  We  regard  the,  apparently  nonhazardous, 
numerical  solution  of  a  quadratic  equation 

ax 2  +  bx  +  c  —  0  ,  (1-23) 


where  a,  b,c  e  M,  a  0.  The  well  known  solutions  read 


Vi 


—b  +  \!b2  —  4  ac 
2  a 


and  V2 


—b  —  V  b2  —  4ac 
2  a 


(1.24) 


Cautious  because  of  the  above  examples,  we  immediately  diagnose  the  danger  of  a 
subtractive  cancellation  in  the  expression  of  x\  fovb>0  or  in  X2  for  b  <  0,  and 
rewrite  the  above  expression  for  x\ : 


x\  = 


(—b  +  \!b2  —  4 ac)  {—b  —  \!b2  —  4 ac) 


2c 


2  a 


(—b  —  \lb2  —  4  ac)  —b  —  Vb2  —  4  ac 


d-25) 
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For  we  obtain 


2c 

X2  —  -  ■  .  (1.26) 

—b  +  wb2  —  4  ac 

Consequently,  if  b  >  0  x\  should  be  calculated  using  Eq.  (1.25)  and  if  b  <  0 
Eq.  (1.26)  should  be  used  to  calculate  X2.  Moreover,  the  above  expressions  can  be 
cast  into  one  expression  by  setting 


a 


and  —  —  , 

q 


with 


1  r 


-  b  +  sgn(Z?)  s/b2  —  4 ac 


2  L 


(1.27) 


(1.28) 


Thus,  Eqs.  (1.27)  and  (1.28)  can  be  used  to  calculate  x\  and  X2  for  any  sign  of  b. 


1.4  Stability 

When  a  new  numerical  method  is  designed  stability  is  the  third  crucial  point  after 
roundoff  errors  and  methodological  errors  [17].  We  give  an  introductory  definition: 

An  algorithm,  equation  or,  even  more  general,  a  problem  is  referred  to  as  unstable  or  ill- 
conditioned  if  small  changes  in  the  input  cause  a  large  change  in  the  output. 

It  will  be  followed  by  a  couple  of  elucidating  examples  [3]. 3  To  be  more  specific, 
let  us  now,  for  instance,  consider  the  following  system  of  equations 

x  -\-  y  —  2.0, 

jc+  1.0  ly  =  2.01  .  (1.29) 

These  equations  are  easily  solved  and  give  v  =  1.0  and  y  —  1.0.  To  make  our 
point  we  consider  now  the  case  in  which  the  right  hand  side  of  the  second  equation 
of  (1.29)  is  subjected  to  a  small  perturbation,  i.e.  we  consider  in  particular  the 
following  system  of  equations 


v  +  y  =  2.0, 
v  +  l.Oly  =  2.02  . 


(1.30) 


3  Although  unstable  behavior  is  not  desirable  in  the  first  place  the  discovery  of  unstable  systems 
was  the  birth  of  a  specific  branch  in  physics  called  Chaos  Theory.  We  briefly  comment  on  this 
point  at  the  end  of  this  section. 


10 


1  Some  Basic  Remarks 


The  corresponding  solution  is  x  —  0.0  and  y  —  2.0.  We  observe  that  a  relative 
change  of  0.05  %  on  the  right  hand  side  of  the  second  equation  in  (1.29)  resulted 
in  a  100  %  relative  change  of  the  solution.  Moreover,  if  the  coefficient  of  y  in  the 
second  equation  of  (1.29)  were  1.0  instead  of  1.01,  which  corresponds  to  a  relative 
change  of  1  %,  the  equations  would  be  unsolvable.  This  is  a  behavior  typical  for 
ill-conditioned  problems  which,  for  obvious  reasons,  should  be  avoided  whenever 
possible. 

We  give  a  second  example:  We  consider  the  following  initial  value  problem 

(  y  -  10y  —  lly  =  0  , 

(y(0)  =  i,  j(0)  =  —i  • 

The  general  solution  is  readily  obtained  to  be  of  the  form 

y  =  A  exp  (— x)  +  B  exp  (1  lx)  , 
with  numerical  constants  A  and  B.  The  initial  conditions  yield  the  unique  solution 

y  =  exp(— v)  .  (1.33) 

The  initial  conditions  are  now  changed  by  two  small  parameters  S,€  >  0  to  give: 

>-(())  =  1  +  8  and  >-(())  =  -1  +  e  .  (1.34) 


(1.31) 


(1.32) 


The  unique  solution  which  satisfies  these  initial  conditions  is: 


exp(-v)  + 


exp  (1  lx)  . 


We  calculate  the  relative  error 


€r 


y-y 


y 


exp(12v)  , 


(1.35) 


(1.36) 


which  indicates  that  the  problem  is  ill-conditioned  since  for  large  values  of  v  the 
second  term  definitely  overrules  the  first  one. 

Another,  but  not  less  serious  kind  of  problem  is  induced  instability : 

A  method  is  referred  to  as  induced  unstable  if  a  small  error  at  one  point  of  the  calculation 
induces  a  large  error  at  some  subsequent  point. 

Induced  instability  is  particularly  dangerous  since  small  roundoff  errors  are 
unavoidable  in  most  calculations.  Hence,  if  some  part  of  the  whole  algorithm  is 
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ill-conditioned,  the  final  output  will  be  dominated  by  the  error  induced  in  such  a 
way.  Again,  an  example  will  help  to  illustrate  such  behavior.  The  definite  integral 

In  —  [  djcjc"  exp(jc  —  1)  ,  (1-37) 

Jo 

is  considered.  Integration  by  parts  yields 

In  =  1  -  nln-i  .  (1.38) 

This  expression  can  be  used  to  recursively  calculate  In  from  70,  where 

4  =  1  -exp(-l)  .  (1.39) 

Although  the  recursion  formula  (1.38)  is  exact  we  will  run  into  massive  problems 
using  it.  The  reason  is  easily  illustrated: 

In  —  1  ftln—l 

=  1  —n  +  n(n—  1)4-2 
=  1  —  n  +  n(n  —  1)  —  n(n  —  1  )(n  —  2)4-3 


=  1  +  +  (— 1)"_lw!/o  •  d-40) 

Thus,  the  initial  roundoff  error  included  in  the  numerical  value  of  4  is  multiplied 
with  n\.  Note  that  for  large  n  we  have  according  to  Stirling’s  approximation 

n\  &  V2nnn+  5  exp  (— n)  ,  (1.41) 

i.e.  an  initial  error  increases  almost  as  nn. 

However,  Eq.  (1.38)  can  be  reformulated  to  give 

In  =  — ]~r  (1  -4+0  ,  (1.42) 

n  +  1 

and  this  opens  an  alternative  method  for  a  recursive  calculation  of  4-  We  can  start 
with  some  value  N  n  and  simply  set  /#  =  0.  The  error  introduced  in  such  a  way 
may  in  the  end  not  be  acceptable,  nevertheless,  it  decreases  with  every  iteration  step 
due  to  the  division  by  n  in  Eq.  (1.42). 

Having  discussed  some  basic  features  of  stability  in  numerical  algorithms  we 
would  like  to  add  a  few  remarks  on  Chaos  Theory.  Chaos  theory  investigates 
dynamical  processes  which  are  very  sensitive  to  initial  conditions.  One  of  the 
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best  known  examples  for  such  a  behavior  is  the  weather  prediction.  Although, 
Poincare  already  observed  chaotic  behavior  while  working  on  the  three  body 
problem,  one  of  the  pioneers  of  chaos  theory  was  E.N.  Lorenz  [18]  (not  to  be 
confused  with  H.  Lorentz,  who  introduced  the  LORENTZ  transformation).  In  1961 
he  ran  weather  simulations  on  a  computer  of  restricted  capacity.  However,  when 
he  tried  to  reproduce  one  particular  result  by  restarting  the  calculation  with  new 
parameters  calculated  the  days  before,  he  observed  that  the  outcome  was  completely 
different  [19].  The  reason  was  that  the  equations  he  dealt  with  were  ill-conditioned, 
and  the  roundoff  error  he  introduced  by  simply  typing  in  the  numbers  of  the 
graphical  output,  increased  drastically,  and,  hence,  produced  a  completely  different 
result.  Nowadays,  various  physical  systems  are  known  which  indeed  behave  in  such 
a  way.  Further  examples  are  turbulences  in  fluids,  oscillations  in  electrical  circuits, 
oscillating  chemical  reactions,  population  growth  in  ecology,  the  time  evolution  of 
the  magnetic  field  of  celestial  bodies,  .... 

It  is  important  to  note,  that  chaotic  behavior  induced  in  such  systems  is  determin¬ 
istic,  yet  unpredictable.  This  is  due  to  the  impossibility  of  an  exact  knowledge  of 
the  initial  conditions  required  to  predict,  for  instance,  the  weather  over  a  reasonably 
long  period.  A  feature  which  is  referred  to  as  the  butterfly  effect :  a  hurricane  can 
form  because  a  butterfly  flapped  its  wings  several  weeks  before.  However,  these 
effects  have  nothing  to  do  with  intrinsically  probabilistic  properties  which  are  solely 
a  feature  of  quantum  mechanics.  In  contrast  to  this,  in  chaos  theory,  the  future  is 
uniquely  determined  by  initial  conditions,  however,  still  unpredictable.  This  is  often 
referred  to  as  deterministic  chaos. 

It  has  to  be  emphasized  that  chaos  in  physical  systems  is  a  consequence  of  the 
equations  describing  the  processes  and  not  a  consequence  of  the  numerical  method 
used  for  modeling.  Therefore,  it  is  important  to  distinguish  between  the  stability  of 
a  numerical  method  and  the  stability  of  a  physical  system  in  general. 

We  will  come  across  chaotic  behavior  again  in  Sect.  6.3  where  we  discuss  chaotic 
behavior  in  the  dynamics  of  the  double  pendulum  [4-8] . 


1.5  Concluding  Remarks 

In  this  chapter  we  dealt  with  the  basic  features  of  numerical  errors  one  is  always 
confronted  with  when  developing  an  algorithm.  One  point  we  neglected  in  our 
discussion  is  the  computational  cost ,  i.e.  the  time  a  program  needs  to  be  executed. 
Although  this  is  a  very  important  point,  it  is  beyond  the  scope  of  this  book.  However, 
one  has  to  find  a  balance  between  the  need  of  achieving  the  most  accurate  result  and 
the  computing  time  required  to  achieve  it.  The  most  accurate  result  is  useless  if 
the  programmer  does  not  get  the  result  within  his  lifetime.  D.  Adams  [20]  put  in 
a  nutshell:  the  super-computer  Deep  Thought  was  asked  to  compute  the  answer  to 
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“The  Ultimate  Question  of  Life,  the  Universe  and  Everything” ,  quote: 

“How  long?”  he  said. 

“Seven  and  a  half  million  years.” 

Another  quite  crucial  point,  which  we  neglected  so  far,  is  the  error  analysis  of 
a  computational  method  which  is  based  on  random  numbers  (in  fact  it  is  pseudo¬ 
random  numbers  and  this  point  will  be  discussed  in  the  second  part  of  this  book).  In 
this  case,  the  situation  changes  completely,  because,  similar  to  experimental  results, 
the  observed  values  are  distributed  around  a  mean  with  a  certain  variance.  Such 
results  have  to  be  interpreted  within  a  statistical  context.  However,  it  turns  out 
that  for  many  problems  the  computational  efficiency  can  be  significantly  increased 
using  such  methods.  Typical  applications  are  estimates  of  integrals  or  solutions  to 
optimization  problems.  Such  topics  will  be  treated  in  the  second  part  of  this  book. 
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Part  I 

Deterministic  Methods 


Chapter  2 

Numerical  Differentiation 


2.1  Introduction 

This  chapter  is  the  first  of  two  systematic  introductions  to  the  numerical  treatment 
of  differential  equations.  Differential  equations  and,  thus,  derivatives  and  integrals 
are  of  eminent  importance  in  the  modern  formulation  of  natural  sciences  and 
in  particular  of  physics.  Very  often  the  complexity  of  the  expressions  involved 
does  not  allow  an  analytical  approach,  although  modern  symbolic  software  can 
ease  a  physicists  life  significantly.  Thus,  in  many  cases  a  numerical  treatment  is 
unavoidable  and  one  should  be  prepared. 

We  introduce  here  the  notion  of  finite  differences  as  a  basic  concept  of  numerical 
differentiation  [1-3].  In  contrast,  the  next  chapter  will  deal  with  the  concepts 
of  numerical  quadrature.  Together,  these  two  chapters  will  set  the  stage  for  a 
comprehensive  discussion  of  algorithms  designed  to  solve  numerically  differential 
equations.  In  particular,  the  solution  of  ordinary  differential  equations  will  always 
be  based  on  an  integration. 

This  chapter  is  composed  of  four  sections.  The  first  repeats  some  basic  concepts 
of  calculus  and  introduces  formally  finite  differences.  The  second  formulates 
approximates  to  derivatives  based  on  finite  differences,  while  the  third  section 
includes  a  more  systematic  approach  based  on  an  operator  technique.  It  allows 
an  arbitrarily  close  approximation  of  derivatives  with  the  advantage  that  the 
expressions  discussed  in  this  section  can  immediately  be  applied  to  the  problems 
at  hand.  The  chapter  is  concluded  with  a  discussion  of  some  additional  aspects. 
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2  Numerical  Differentiation 


2.2  Finite  Differences 

Let  us  consider  a  smooth  function /(x)  on  the  finite  interval  [a,  b]  C  R  of  the  real 
axis.  The  interval  [a,  b]  is  divided  into  N  —  1  e  N  equally  spaced  sub-intervals  of 
the  form  [jq-,  where  x\  =  a,  —  b.  Obviously,  X(  is  then  given  by 

Xi  =  a  +  (i  -  1)^—^,  /  =  l, ...  ,N  .  (2.1) 

We  introduce  the  distance  h  between  two  grid-points  jq  by: 

h  =  Xi+i—Xi=^ — V/  =  1, . . .  ,N  -  1  .  (2.2) 

N  -  1 

For  the  sake  of  a  more  compact  notation  we  restrict  our  discussion  to  equally  spaced 
grid-points  keeping  in  mind  that  the  extension  to  arbitrarily  spaced  grid-points  by 
replacing  h  by  hi  is  straight  forward  and  leaves  the  discussion  essentially  unchanged. 

Note  that  the  number  of  grid-points  and,  thus,  their  distance  h ,  has  to  be  chosen 
in  such  a  way  that  the  function  f(x)  can  be  sufficiently  well  approximated  by  its 
function  values /(v/)  as  indicated  in  Fig.  2.1.  We  understand  by  sufficiently  well 
approximated  that  some  interpolation  scheme  in  the  interval  [*;,  jq+i]  will  reproduce 
the  function/(v)  within  a  required  accuracy.  In  cases  where  the  function  is  strongly 
varying  within  some  sub-interval  [c,d\  C  [a,b]  and  is  slowly  varying  within 


Fig.  2.1  We  define  equally  spaced  grid-points  xt  on  a  finite  interval  on  the  real  axis  in  such  a 
way  that  the  function /(x)  is  sufficiently  well  approximated  by  its  functional  values /(x,)  at  these 
grid-points 


2.2  Finite  Differences 
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[a,  b]  \  [c,  d]  it  might  be  advisable  to  use  variable  grid- spacing  in  order  to  reduce 
the  computational  cost  of  the  procedure. 

We  introduce  the  following  notation:  The  function  value  of f(x)  at  the  grid-point 
xt  will  be  denoted  by  ft  =  f(xt)  and  its  n- th  derivative: 


Furthermore,  we  define  for  arbitrary  £  e  [xi,  v/+i) 


An) 

J  i~\~€ 


where 


=  fi+e  and  e  is  chosen  to  give: 


(2.3) 


(2.4) 


^  —  Xi  T  €h  ,  €  G  [0,  1)  . 


(2.5) 


Let  us  remember  some  basics  from  calculus:  The  first  derivative,  denoted  f(x) 
of  a  function/(v)  which  is  smooth  within  the  interval  [a,  b],  i.e./(v)  e  00  [a ,  b\  for 
arbitrary  v  e  [a,  b\,  is  defined  as 


f'(x )  :=  lim 
h—>0 


lim 

h-+  0 


lim 

h->  0 


fix  +  h)  -f(x) 
h 

fix)  ~ fix  -  h) 
h 

fix  +  h)  -fix  -  h) 
2  h 


(2.6) 


However,  it  is  impossible  to  draw  numerically  the  limit  h  ->  0  as  discussed  in 
Sect.  1.3,  Eq.  (1.22).  This  manifests  itself  in  a  non-negligible  error  due  to  subtractive 
cancellation. 

This  problem  is  circumvented  by  the  use  of  Taylor’s  theorem.  It  states  that  if 
there  is  a  function  which  is  (n  +  1) -times  continuously  differentiable  on  the  interval 
[a,  b]  then  f(x)  can  be  expressed  in  terms  of  a  series  expansion  at  point  xq  e  [a,  b}\ 


k=0 


k\ 


(«+  !) 


Wx  e  [a,  b] 


(2.7) 


Here,  ^  (v)  takes  on  a  value  between  v  and  x$.  The  last  term  on  the  right  hand  side 
of  Eq.  (2.7)  is  commonly  referred  to  as  truncation  error.  (A  more  general  definition 
of  this  error  was  given  in  Sect.  1.1.) 


!Note  that  for  xq  =  0  the  series  expansion  (2.7)  is  referred  to  as  Mclaurin  series. 
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2  Numerical  Differentiation 


We  introduce  now  the  finite  difference  operators 

A+fi  =  fi+i  ~fi  , 


as  the  forward  difference , 


as  the  backward  difference ,  and 


(2.8a) 


(2.8b) 


(2.8c) 


as  the  central  difference.  The  derivative  of f(x)  can  be  approximated  with  the  help 
of  Taylor’s  theorem  (2.7).  In  a  first  step  we  consider  (restricting  to  third  order  in 
h) 


fi+ 1  =/(*<)  +  hf'(xi)  +  ~~f"  (xi)  +  (•*;  +  h)] 

2  6 

—  ft  +  hfl  +  y/T  +  »  (2.9a) 

with/;+i  =  /(xj  +  /?).  Here  6^  is  the  fractional  part  e  which  has  to  be  determined 
according  to  f(x[  +  h).  In  analogue  we  find  forf-\ 

fir-i  =fi~  hf!  +  h^f[’  -  .  (2.9b) 

Solving  Eqs.  (2.9)  for  the  derivative  f[  leads  directly  to  the  definition  of  finite 
difference  derivatives. 


2.3  Finite  Difference  Derivatives 


We  define  the  finite  difference  derivative  or  difference  approximations 


as  the  forward  difference  derivative , 


(2.10a) 


2Please  note  that  the  symbols  A+,  A—,  and  Ac  in  Eqs.  (2.8)  are  linear  operators  acting  on  f.  For  a 
basic  introduction  to  the  theory  of  linear  operators  see  for  instance  [4,  5]. 
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Fig.  2.2  Graphical 
illustration  of  different  finite 
difference  derivatives.  The 
solid  line  labeled// 
represents  the  real  derivative 
for  comparison 


n  ,  _  A-fi  _  fi  -fi-i 
~Ji  h  h 

as  the  backward  difference  derivative ,  and 


(2.10b) 


(2.10c) 


as  the  central  difference  derivative.  A  graphical  interpretation  of  these  expressions 
is  straight  forward  and  is  presented  in  Fig.  2.2. 

Using  the  above  definitions  (2.10)  together  with  the  expansions  (2.9)  we  obtain 


.  h  „  h2  ... 

f!  =  D+fi  -  -f!'  - 

D-fi  +  !/"  - 

=  Dcf,  -  .  (2.11) 

We  observe  that  in  the  central  difference  approximation  of  f[  the  truncation  error 
scales  like  hr  while  it  scales  like  h  in  the  other  two  approximations;  thus  the  central 


3  The  central  difference  derivative  is  related  to  the  forward  and  backward  difference  derivatives  via: 


i(D++D_). 
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difference  approximation  should  have  the  smallest  methodological  error.  Note  that 
the  error  is  usually  not  dominated  by  the  derivatives  of  f(x)  since  we  assumed  that 
f(x)  is  a  smooth  function  and  sufficiently  well  approximated  on  the  grid  within 
[a,  b\.  Furthermore  we  have  to  emphasize  that  the  central  difference  approximation 
is  essentially  a  three  point  approximation,  including /_i,  /  and/+i,  although/ 
cancels.  Thus,  we  can  improve  our  approximation  by  taking  even  more  grid-points 
into  account.  For  instance,  we  could  combine  the  above  finite  difference  derivatives. 
Let  us  prepare  this  step  by  expanding  Eqs.  (2.9)  to  higher  order  derivatives.  We  then 
obtain  for  the  forward  difference  derivative 


for  the  backward  difference  derivative 


and,  finally,  for  the  central  difference  derivative 


(2.12) 


(2.13) 


(2.14) 


In  order  to  improve  the  method  we  have  to  combine  £)+/,  D-fi  and  Dcft  from 
different  grid-points  in  such  a  way  that  at  least  the  terms  proportional  to  h 2  cancel. 
This  can  be  achieved  by  observing  that 


%Dcf,  -  D£ -i  -  Dcfi+i  =  6 f!  -  ,  (2.15) 

which  gives 

f!  =  \  QDcfi  -  DJi+i  -  DJi-i)  +  L/;:v' 

6  30 

=  3-  (fi-2  -  8/i-i  +  8 fi+1  -fi+2)  +  ■  (2.16) 

12  h  30 

Note  that  this  simple  combination  yields  an  improvement  of  two  orders  in  h  !  One 
can  even  improve  the  approximation  in  a  similar  fashion  by  simply  calculating  the 
derivative  from  even  more  points,  for  instance /± 3. 


4 Please  note  that  the  Taylor  expansion  of  (DQfi—  1  +D/-+ 1)/2  =  (/•+2— fi—2)/(4h)  is  equivalent 
to  the  expansion  (2.14)  of  Dcft  with  h  replaced  by  2 h. 
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2.4  A  Systematic  Approach:  The  Operator  Technique 

We  would  like  to  obtain  a  general  expression  which  will  allow  to  calculate  the  finite 
difference  derivatives  of  arbitrary  order  up  to  arbitrary  order  of  h  in  the  truncation 
error.  We  achieve  this  goal  by  introducing  the  shift  operator  T  and  its  inverse 
operator  T~l  as5 


Tft  =fi+i  ,  (2.18) 

and 

1,  (2.19) 

where  7T_1  —  1  is  the  unity  operator.  We  can  write  these  operators  in  terms  of  the 
forward  and  backward  difference  operators  A+  and  A-  of  Eqs.  (2.8),  in  particular 

T=l  +  A+  ,  (2.20) 


and 


T~l  =  t  -  A-  . 


(2.21) 


Moreover,  if  D  =  d/dv  denotes  the  derivative  operator  and  if  the  n- th  power  of  this 
operator  D  is  understood  as  the  n- th  successive  application  of  it,  we  can  rewrite  the 
Taylor  expansions  (2.9)  as 


1  +  hD  +  - h2D 2  +  —  h3D3  + 
2  3! 


=  exp  (hD)fi  , 


(2.22) 


5  We  note  in  passing  that  the  shift  operators  form  the  discrete  translational  group,  a  very  important 
group  in  theoretical  physics.  Let  T(n)  =  Tn  denote  the  shift  by  n  G  N  grid-points.  We  then  have 

T(n)T(m)  =  T{n  +  m)  ,  (2.17a) 

r(0)  =  1  ,  (2.17b) 

and 

T(n )-1  =  T(-n)  ,  (2.17c) 

which  are  the  properties  required  to  form  a  group.  Here  1  denotes  unity.  Moreover,  we  have 

T{n)T{m)  =  T{m)T{n )  ,  (2.17d) 

i.e.  it  is  an  Abelian  group.  The  group  of  discrete  translations  is  usually  denoted  by  Td  [6]. 


24 


2  Numerical  Differentiation 


and 


t-hD+  -h2D2  -  —h3D3  ± 
2  3! 


ft 


=  exp  {—hD)fi  , 


(2.23) 


Hence,  we  find  that  [7] 6 


T  —  1  +  A+  =  exp  (hD)  , 


(2.24) 


and,  accordingly,  that 


T  1  =  1  —  A-  =  exp  (—hD)  . 


(2.25) 


Finally,  we  obtain  the  central  difference  operator: 


Ac  —  T  —  T  1  =  exp  (hD)  —  exp  (—hD)  =  2  sinh  (hD)  .  (2.26) 


Equations  (2.24),  (2.25)  and  (2.26)  can  be  inverted  for  hD: 


In  (1  +  A+) 


hD  = 


-In  (1  -A-) 


V 


1  9  1  o 

—  ~~  2^+  3 ^  •  •  •  » 

1  ,  1  . 

—  A—  +  —  2\_  +  —  2\_  +  . . .  , 

1  (Ac\ 3  32  / 2\c\5 

2  3!y2y  ^5!\2y 


(2.27) 


Again,  the  r^-th  power  of  an  operator  K  (with  K  —  2\+,  Z\_,  Z\c)  Knfi  is  understood 
as  the  n- th  successive  action  of  the  operator  K  on  i.e.  Kn~l  (Kfj).  Expres¬ 
sion  (2.27)  allows  to  approximate  the  derivatives  up  to  arbitrary  order  using  finite 
differences.  Furthermore,  we  can  take  the  k-th  power  of  Eq.  (2.27)  in  order  to  get  an 
approximate  k-th  derivative,  (hD)k  [7] . 

However,  it  turns  out  that  the  expansion  (2.27)  in  terms  of  the  central  difference 
Ac  does  not  optimally  use  the  grid  because  it  contains  only  odd  powers  of  Ac.  For 
instance,  the  third  power  A3cf  includes  the  function  values  fi±3  and//±i  at  ‘odd’ 
grid-points  but  ignores  the  function  values  /  and/±2  at  ‘even’  grid-points.  Since 
this  is  true  for  all  odd  powers  of  Ac  we  observe  that  the  expansion  (2.27)  uses  only 
half  of  the  grid.  On  the  other  hand,  if  one  computes  the  square  (hD)2  of  (2.27)  only 
‘even’  grid-points  are  used,  while  the  ‘odd’  grid-points  are  ignored.  This  reduces 
the  accuracy  of  the  method  and  an  improvement  is  required.  The  easiest  remedy 


6This  representation  of  the  shift  operator  T  explains  why  the  derivative  operator  D  is  frequently 
referred  to  as  the  infinitesimal  generator  of  translations  [6]. 
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is  to  formally  introduce  function  values  T^ifi  —  ft± \/2  at  intermediate  grid-points 
Xi±  1/2  =  Xi  ±  h/2.  This  definition  allows  to  introduce  the  central  difference  operator 
8C  of  intermediate  grid-points, 


Sr  —  T2  —  T  2—2 sinh 


<")■ 


(2.28) 


and  the  average  operator: 


l± 


1  /  i  _i\  ,  ( hD 

=  2(n+7-!)  =  C0Sh(T 


(2.29) 


The  central  difference  operator  Ac  on  the  grid  is  connected  to  the  central  difference 
operator  Sc  of  intermediate  grid-points  by: 


Ac  =  2  jiSc. 


(2.30) 


To  avoid  the  problem  of  Eq.  (2.27)  that  only  odd  or  even  grid-points  are  accounted 
for  we  replace  all  shift  operators  AJ  2  by  8C  and  then  multiply  the  right  hand  side 
of  Eq.  (2.27)  by  /x.  This  ensures  that  function  values  at  intermediate  grid-points  will 
not  appear  in  the  final  expression.  Hence,  we  obtain  for  the  first  order  derivative 
operator: 


1  9  1  9 

A+  —  -2\+  3 ^  •  •  • » 

1  9  1  9 

A—  +  +  —A_  +  . . . , 

2  3 


1  32 

fi8c  —  ^ 


(2.31) 


When  higher  order  derivatives  are  calculated,  we  replace,  again,  Ac/2  by  8C  and 
multiply  odd  powers  of  8C  by  /x.  This  procedure  results,  for  instance,  in  the  second 
order  derivative  operator: 


(2.32) 


These  intermediate  grid-points  are  virtual,  auxiliary  grid-points  which  will  be  eliminated  in  due 


course. 
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In  particular,  we  obtain  for  the  central  difference  derivative 

f-  =  fi+1~/i~1  +  0(h2) ,  (2.33) 

2  h 

and 

J?  =  —  -  ^  +  ^~1  +  0(h2)  .  (2.34) 

hz 

Here,  &(h2)  indicates  that  this  term  is  of  the  order  of  h 2  and  we  get  the  important 
result  that  the  truncation  error  is  of  the  order  &(h2).s 


2.5  Concluding  Discussion 

First  of  all,  although  Eq.  (2.27)  allows  to  approximate  a  derivative  of  any  order  k 
arbitrarily  close,  it  is  still  an  infinite  series  which  leaves  us  with  the  decision  at 
which  order  to  truncate.  This  choice  will  highly  depend  on  the  choice  of  h  which  in 
turn  depends  on  the  function  we  would  like  to  differentiate.  Consider,  for  instance, 
the  periodic  function 


fix)  —  exp  iicox)  ,  (2.35) 

where  and  i  is  the  imaginary  unit  with  r  —  —  1 .  Its  first  derivative  is 

fr{x)  —  ico  exp  iicox)  .  (2.36) 

We  now  introduce  grid-points  by 


Xk  =  xo  +  kh  ,  (2.37) 

where  h  is  the  grid-spacing  and  Vo  is  some  finite  starting  point  on  the  real  axis. 
Accordingly, 


fk  =  exp  [i<a(x o  +  kh)]  , 


(2.38) 


8  The  leading  order  of  the  truncation  error  can  be  determined  by  inserting  the  dominant  contribution 
of  Eqs.  (2.28)  and  (2.29)  into  the  remainder  of  Eqs.  (2.31)  and  (2.32),  respectively.  For  instance, 
it  follows  from  Eq.  (2.29)  that  /x  ~  ^(1)  and  from  Eq.  (2.28)  that  8C  ~  &ih)  and,  hence,  we 
find  with  the  help  of  Eq.  (2.31)  that  /i8fh  ~  ^(/z2).  In  analogue,  we  obtain  from  Eq.  (2.32)  that 
8fh2  ~  &ih2). 
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and  the  exact  value  of  the  first  derivative  is 

fl  —  ico  exp  [ico(x o  +  kh)]  —  icofk  .  (2.39) 

We  calculate  the  forward,  backward,  and  central  difference  derivatives  according  to 
Eqs.  (2.10)  and  obtain 


f  ihco  \  .  /  hco 

D+fk  —  icofk  exp  I  —  I  sine 


(2.40a) 


with  sinc(v)  =  sin(x)/x  and 


ihco  \  .  ( hco 

D-fk  —  icofk  exp - I  sine 


(2.40b) 


and 


Dcfk  =  icofksincQuo) 


(2.40c) 


We  divide  the  approximate  derivatives  by  the  true  value  (2.39)  and  take  the  modulus. 
We  get 


D+fk 

fl 


D-fk 

fl 


hco 


—  sine 


(2.41) 


and 


Dcfk 

fl 


=  sine  (hco) 


(2.42) 


Since  |  sin(v)|  <  |jc|,  Vx  e  M  we  obtain  that  in  all  three  cases  this  ratio  is  less 
than  one  independent  of  h ,  unless  co  —  0.  (Please  keep  in  mind  that  sinc(v)  ->  1 
as  v  ->  0.)  Hence,  the  first  order  finite  difference  approximations  underestimate 
the  true  value  of  the  derivative.  The  reason  is  easily  found:  f(x)  oscillates  with 
frequency  co  while  the  finite  difference  derivatives  applied  here  approximate  the 
derivative  linearly.  Higher  order  corrections  will,  of  course,  improve  the  approxi¬ 
mation  significantly.  Furthermore,  we  observe  that  the  one-sided  finite  difference 
derivatives  (2.40a)  and  (2.40b)  are  exactly  zero  if  hco  —  Inn,  n  e  N,  i.e.  if  the  grid¬ 
spacing  h  matches  a  multiple  of  the  frequency  2 nco  of  the  function/(v).  The  same 
occurs  when  central  derivatives  (2.40c)  are  used,  but  now  for  hco  —  nn.  This  is  not 
really  a  problem  in  our  example  because  we  choose  the  grid- spacing  h  In / co  in 
order  to  approximate  the  function/(v)  sufficiently  well.  However,  in  many  cases  the 
analytic  form  of  the  function  is  unknown  and  we  only  have  its  representation  on  the 
grid.  In  this  case  one  has  to  check  carefully  by  changing  h  whether  the  function  is 
periodic  or  not. 
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We  discuss,  finally,  how  to  approximate  partial  derivatives  of  functions  which 
depend  on  more  than  one  variable.  Basically  this  can  be  achieved  by  independently 
discretisizing  the  function  of  interest  in  each  particular  variable  and  then  by  defining 
the  corresponding  finite  difference  derivatives.  We  will  briefly  discuss  the  case  of 
two  variables  and  the  extension  to  even  more  variables  is  straight  forward.  We  regard 
a  function  g(v,y)  where  (v,y)  e  [a,b\  x  [c,d\.  We  denote  the  grid-spacing  in  x- 
direction  by  hx  and  in  y-direction  by  hy.  The  evaluation  of  derivatives  of  the  form 
g(x,  y)  or  g(x,  y)  for  arbitrary  n  are  approximated  with  the  help  of  the  schemes 
discussed  above,  only  the  respective  grid- spacing  has  to  be  accounted  for.  We  will 

now  briefly  discuss  mixed  partial  derivatives,  in  particular  the  derivative  -^g(x,  y). 
Higher  orders  can  be  easily  obtained  in  the  same  fashion.  Here,  we  will  restrict  to 
the  case  of  the  central  difference  derivative.  Again,  the  extension  to  the  other  two 
forms  of  derivatives  is  straight  forward.  We  would  like  to  approximate  the  derivative 
at  the  point  (< a  +  ihx,  c  +  jhy),  which  will  be  abbreviated  by  (/,/).  Hence,  we  compute 


9  9 
9y  dx 


g(x,y ) 


Oj) 


2  hx 


9 

dy 


g(x,y ) 


-  tt g(x,y ) 
a+ij)  dy 


O'-ij) 


+  @{hl) 


1 

2  hx 


8  i-\-  1  1  8  i~\~  1  ,j —  1 

2  hy 


+  0(hy) 


O’+i  J) 


gi-lj+l  -gi-lj-l 

2  hy 


O'-i  j')_ 


+  &(h2x)  , 


(2.43) 


where  we  made  use  of  the  notation  gtj  =  g(xi,yj).  Neglecting  higher  order 
contributions  yields 


9  9 
9y  dx 


g(x,y ) 


(ij) 


^  8 i~\~  1 J+ 1  8i~\~  l  j— 1  8i~\  J+ 1  d-  8 i~  l  ,j—  1 

2  hx  2  hy 


(2.44) 


This  simple  approximation  is  easily  improved  with  the  help  of  methods  developed 
in  the  previous  sections. 

It  should  be  noted  that  there  are  also  other  methods  to  approximate  derivatives. 
One  of  the  most  powerful  methods,  is  the  method  of  finite  elements  [8].  The 
conceptual  difference  to  the  method  of  finite  differences  is  that  one  divides  the 
domain  in  finite  sub-domains  (elements)  rather  than  by  replacing  these  by  sets  of 
discrete  grid-points.  The  function  of  interest,  say  g(x,  y),  is  then  replaced  within  each 
element  by  an  interpolating  polynomial.  However,  this  method  is  quite  complex 
and  definitely  beyond  the  scope  of  this  book.  Another  interesting  method,  which  is 
particularly  useful  for  the  solution  of  hyperbolic  differential  equations,  is  the  method 
of  finite  volumes.  The  interested  reader  is  referred  to  the  book  by  R.  J.  LeVeque 
[9]. 
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Summary 

In  a  first  step  the  notion  of  finite  differences  was  introduced:  All  functions 
are  approximated  only  by  their  functional  values  at  discrete  grid-points  and  by 
interpolation  schemes  between  these  points.  This  served  as  a  basis  for  the  definition 
of  finite  difference  derivatives.  Three  different  types  were  discussed:  the  forward, 
the  backward,  and  the  central  difference  derivative.  A  more  systematic  approach  to 
finite  difference  derivatives  was  then  offered  by  the  operator  technique.  It  provided 
ready  to  use  equations  which  allowed  to  approximate  a  particular  derivative  of 
arbitrary  order  to  arbitrary  order  of  grid- spacing.  The  two  methodological  errors 
introduced  by  this  method,  namely  the  subtractive  cancellation  error  due  to  too 
dense  a  grid  and  the  truncation  error  due  to  too  coarse  a  grid  were  discussed  in 
detail. 


Problems 

1.  Derive  Eq.  (2.32). 

2.  Calculate  numerically  the  derivative  of  the  function 

r\ 

fix)  —  cos(&>iv)  +  exp(— x  /2)  sin(&>2*), 

with  co 2  —  0.5  and  CO2  =  I0co\.  Use  a  non-uniform  grid.  Calculate  locally  the 
relative  error  of  your  approximation. 

3.  Extend  your  code  of  the  previous  example  to  arbitrary  co\  <  IO002  and  02  —  0.5 
by  implementing  an  adaptive  grid- spacing.  In  particular,  write  a  routine  which 
recursively  finds  a  suitable  grid- spacing. 

4.  Consider  the  finite  interval  I  —  [—5,  5]  on  the  real  axis.  Define  N  equally  spaced 
grid-points  xi  =  x\  +  (i  —  1  )h,  i  —  1, . . . ,  N.  Investigate  the  functions 

g(v)  =  exp  (— v2)  and  h(x )  =  sin(v). 

a.  Plot  these  functions  within  the  interval  I  by  defining  these  functions  on  the 
grid-points  X[. 

b.  Plot  the  first  derivative  of  these  functions  by  analytical  differentiation. 

c.  Calculate  and  plot  the  first  derivatives  of  these  functions  by  employing  the  first 
order  backward,  forward,  and  central  difference  derivatives.  For  the  central 
difference  derivative  use  an  algorithm  which  is  based  on  the  grid-points  xi~\ 
and  X[+\  rather  than  the  method  based  on  intermediate  grid-points  xi±  1 . 

d.  Calculate  and  plot  the  first  central  difference  derivatives  of  these  functions 
by  employing  second  order  corrections.  These  corrections  can  be  obtained  by 
applying  the  sum  representation  of  the  derivative  operator  defined  in  Sect.  2.4, 
last  line  of  Eq.  (2.31),  i.e.  take  the  term  proportional  to  into  account! 
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e.  Calculate  the  absolute  and  the  relative  error  of  the  above  methods.  Note  that 
the  exact  values  are  known  analytically. 

f.  Repeat  the  above  steps  for  the  second  derivative  of  the  function  h(x).  For 
the  second  order  correction  of  the  central  difference  derivative  take  the  term 
proportional  to  8*  in  Eq.  (2.32)  into  account. 

g.  Try  different  values  of  N. 

5.  Consider  the  function: 


fix,  y)  =  cos(x)exp(— y2). 

a.  Calculate  numerically  its  gradient  V/(x,y)  and  compare  with  the  analytical 
result. 

b.  Demonstrate  numerically  that  gradient  fields  are  curl-free,  i.e.  V  x  V/(x,y)  = 
0  for  all  v  and  y. 
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Chapter  3 

Numerical  Integration 


3.1  Introduction 

Numerical  integration  is  certainly  one  of  the  most  important  concepts  in  computa¬ 
tional  analysis  since  it  plays  a  major  role  in  the  numerical  treatment  of  differential 
equations.  Given  a  function /(v)  which  is  continuous  on  the  interval  [ a,b\ ,  one 
wishes  to  approximate  the  integral  by  a  discrete  sum  of  the  form 

nb  N 

/  dxf(x)  «  (3.1) 

Ja  i=  l 


where  the  cc>i  are  referred  to  as  weights  and  xi  are  the  grid-points  at  which 
the  function  needs  to  be  evaluated.  Such  methods  are  commonly  referred  to  as 
quadrature  [1,  2]. 

We  will  mainly  discuss  two  different  approaches  to  the  numerical  integration  of 
arbitrary  functions.  We  start  with  a  rather  simple  approach,  the  rectangular  rule.  The 
search  of  an  improvement  of  this  method  will  lead  us  first  to  the  trapezoidal  rule, 
then  to  the  Simpson  rule  and,  finally,  to  a  general  formulation  of  the  method,  the 
Newton-Cotes  quadrature.  This  will  be  followed  by  a  more  advanced  technique, 
the  Gauss-Legendre  quadrature.  At  the  end  of  the  chapter  we  will  discuss  an 
elucidating  example  and  briefly  sketch  extensions  of  all  methods  to  more  general 
problems,  such  as  integration  of  non-differentiable  functions  or  the  evaluation  of 
multiple  integrals. 

Another  very  important  approach,  which  is  based  on  random  sampling  methods, 
is  the  so  called  Monte-Carlo  integration.  This  method  will  be  presented  in  Sect.  14.2. 
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3.2  Rectangular  Rule 

The  straight  forward  approach  to  numerical  integration  is  to  employ  the  concept  of 
finite  differences  developed  in  Sect.  2.2.  We  regard  a  smooth  function/ (v)  within  the 
interval  [a,  b],  i.e./(v)  e  *€ 00  [a ,  b\.  The  Riemann  definition  of  the  proper  integral 
of f(x)  from  a  to  b  states  that: 


(3.2) 


We  approximate  the  right  hand  side  of  this  relation  using  equally  spaced  grid-points 
Xi  e  [a,  b]  according  to  Eq.  (2.1)  and  find 


(3.3) 


It  is  clear  that  the  quality  of  this  approach  strongly  depends  on  the  discretization 
chosen,  i.e.  on  the  values  of  X[  as  illustrated  schematically  in  Fig.  3.1.  Again,  a  non- 
uniform  grid  may  be  of  advantage.  We  can  estimate  the  error  of  this  approximation 
by  expandin g/(v)  into  a  Taylor  series. 

We  note  that 


(3.4) 


Fig.  3.1  Illustration  of  the 
numerical  approximation  of  a 
proper  integral  according  to 
Eq.  (3.3) 


f(x)A 


H  h  k 


Xi+1  Xi+2 


X 
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hence,  the  approximation  (3.3)  is  equivalent  to  an  estimate  of  the  area  in  the  unit 
interval,  the  elemental  area : 


Furthermore,  we  find  following  Eq.  (2.9a): 


rxi+ 1 

J X; 


d xf(x) 


Cxi+\  r  -| 

=  /  c be  fi  +  (x-  Xi)f'i+H 

J  X;  L  J 


=  fh  +  @{h2) 


(3.5) 


(3.6) 


In  this  last  step  we  applied  the  first  mean  value  theorem  for  integration  which  states 
that  if  f{x)  is  continuous  in  [a,  b\,  then  there  exists  a  £  e  [a,  Z?]  such  that 

b 

d xf(x)  =  (b-a)m.  (3.7) 

(We  shall  come  back  to  the  mean  value  theorem  in  the  course  of  our  discussion 
of  Monte-Carlo  integration  in  Chap.  14.)  Consequently,  the  error  we  make  with 
approximation  (3.3)  can  be  seen  from  Eq.  (3.6)  to  be  of  the  order  &(h2). 

This  procedure  corresponds  to  a  forward  difference  approach  and,  equivalently, 
backward  differences  can  be  used.  This  results  in: 

b  N 

d xf(x)  =  h  J2fi  +  0(h2).  (3.8) 

1=2 


Let  us  now  define  the  forward  and  backward  rectangular  rule  by 


./+ 

ili+\ 


(3.9) 


and 


i^i- (-1  ¥i+l> 


respectively.  Thus,  we  obtain  from  Taylor’s  expansion  that: 


L 


•^/+i  7-3 

dxf(x)  =  i'Li  +  jfi  +  37 fi  +  •  • 


(3.10) 
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However,  the  use  of  central  differences  gives  more  accurate  results  as  has  already 
been  observed  in  Chap.  2  in  which  the  differential  operator  was  approximated.  We 
make  use  of  the  concept  of  intermediate  grid-points  (see  Sect.  2.4)  and  consider  the 
integral 


d xf(x), 


expand  f(x)  in  a  Taylor  series  around  the  midpoint  xi+ 1 ,  and  obtain: 


7 


-  hfi+  i  + 

—  ih+ 1  + 


h3  „ 

_ f" 

2^Jl+€s 

h 3 

[_ftt 


(3.12) 


(3.13) 


Thus,  the  error  generated  by  this  method,  the  central  rectangular  rule,  scales  as 
&(h3)  which  is  a  significant  improvement  in  comparison  to  Eqs.  (3.3)  and  (3.8). 
We  obtain 


+  &(hy). 


(3.14) 


This  approximation  is  known  as  the  rectangular  rule .  It  is  illustrated  in  Fig.  3.2. 
Note  that  the  boundary  points  x\  —  a  and  xn  =  b  do  not  enter  Eq.  (3.14).  Such  a 
procedure  is  commonly  referred  to  as  an  open  integration  rule.  On  the  other  hand, 
if  the  end-points  are  taken  into  account  by  the  method  it  is  referred  to  as  a  closed 
integration  rule. 


Tn  this  context  the  intermediate  position  \/2  is  understood  as  a  true  grid-point.  If,  on  the  other 
hand,  the  function  value /]•+ 1/2  is  approximated  by  />&/}+ 1/2,  Eq.  (2.29),  the  method  is  referred  to  as 
the  trapezoidal  rule. 
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Fig.  3.2  Scheme  of  the 
rectangular  integration  rule 
according  to  Eq.  (3.14).  Note 
that  boundary  points  do  not 
enter  the  evaluation  of  the 
elemental  areas 


3.3  Trapezoidal  Rule 


An  elegant  alternative  to  the  rectangular  rule  is  found  when  the  area  between  two 
grid-points  is  approximated  by  a  trapezoid  as  is  shown  schematically  in  Fig.  3.3. 
The  trapezoidal  rule  is  obtained  when  the  function  values/+i/2  at  intermediate  grid- 
points  on  the  right  hand  side  of  the  central  rectangular  rule  (3.13)  are  approximated 
with  the  help  of  iifi+1/2,  Eq.  (2.29).  Thus,  the  elemental  area  is  calculated  from 


and  we  obtain: 


l 


xi+i  h 


rb  h  N~{ 

/  dxf(x)  « 

Ja  z  i=\ 


—  h  (  ~\~  fz  +  . . .  +/n-  1  +  y 


h 


N- 1 


—  7:  (fi  +/n)  +  h  Yj 


i= 2 


Tt 
1  *Nm 


(3.15) 


(3.16) 


Note  that  this  integration  rule  is  closed,  although  the  boundary  points  f\  and  fy 
enter  the  summation  (3.16)  only  with  half  the  weight  in  comparison  to  all  other 
function  values  fi .  This  stems  from  the  fact  that  the  function  values  f\  and  fy 
contribute  only  to  one  elemental  area,  the  first  and  the  last  one.  Another  noticeable 
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Fig.  3.3  Sketch  of  how  the 
elemental  areas  under  the 
curve  f{x)  are  approximated 
by  trapezoids 


feature  of  the  trapezoidal  rule  is  that,  in  contrast  to  the  rectangular  rule  (3.14),  only 
function  values  at  grid-points  enter  the  summation,  which  can  be  desirable  in  some 
cases. 

The  error  of  this  method  can  be  estimated  by  inserting  expansion  (2.9a)  into 
Eq.  (3.16).  One  obtains  for  an  elemental  area: 


ili+ 1  —  2  $  +  ■/*'+ 1) 

h3 

=  hfi  +  jf!  +  jfi  + 

On  the  other  hand,  we  know  from  Eq.  (3.6)  that 


fXi+l  /z2  .  h3  .. 

¥i  =  /  d xf(x)  -  —f  -  —f 


which,  when  inserted  into  (3.17),  yields 


IT  - 

i*i+ 1  — 


■v/+ 1 


h: 


—  f  dxf  (v)  +  —f"  + 

^  Xi 


(3.17) 


(3.18) 


(3.19) 


Hence,  we  observe  that  the  error  induced  by  the  trapezoidal  rule  is  comparable  to 
the  error  of  the  rectangular  rule,  namely  &(h3).  However,  since  we  do  not  have  to 
compute  function  values  at  intermediate  grid-points,  this  rule  may  be  advantageous 
in  many  cases. 

We  remember  from  Chap.  2  that  a  more  accurate  estimate  of  a  derivative  was 
achieved  by  increasing  the  number  of  grid-points  involved  which  in  the  case  of 
integration  leads  us  to  the  Simpson  rule. 
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3.4  The  Simpson  Rule 

The  basic  idea  of  the  Simpson  rule  is  to  include  higher  order  derivatives  into  the 
expansion  of  the  integrand.  These  higher  order  derivatives,  which  are  primarily 
unknown,  are  then  approximated  by  expressions  we  obtained  within  the  context  of 
finite  difference  derivatives.  Let  us  discuss  this  procedure  in  greater  detail.  To  this 
purpose  we  will  study  the  integral  of f(x)  within  the  interval  [v;_i ,  v/+i]  and  expand 
the  integrand  around  the  midpoint  xf. 


rxi+ 1 

rxi+ 1 

/  dxf(x)  = 

/  dx  . 

'Xi- 1 

JXi-i 

v  ,  ( X-Xi)- 


+ 


(x-Xi)  rm 


3! 


■f  + 


h 3 


2  hfi  +  —f"  +  OVr) 


2! 


-fi 


■// 


(3.20) 


Inserting  Eq.  (2.34)  for  f  yields 


f  d xf(x)  =  2 hfi  +  |  (fi+i  -  2 ft  +fi- 1)  +  0(h5) 

Jxi-i  d 


—  h  [  -fi~ i  +  -f  +  -f+ 1  )  +  0(h5), 


(3.21) 


Note  that  in  contrast  to  the  trapezoidal  rule,  the  procedure  described  here  is  a  three 
point  method  since  the  function  values  at  three  different  points  enter  the  expression. 
We  can  immediately  write  down  the  resulting  integral  from  a  to  b.  Since, 


rb  nx2  nxA  pxn 

I  d xf(x)  —  /  d xf(x)  +  /  dxf(x)  +  . . .  +  /  dxf(x), 

J  a  t/rn  J  xi  J  x  a 


(3.22) 


XN-2 


where  we  assumed  that  N  is  even  and  employed  the  discretization  Xi  —  xq,  +  ih  with 
vq  —  a  and  x^  —  b.  We  obtain: 


f 

J  a 


b  h 

dxf{x)  —  —  (fo  +  4/i  +  2/2  +  4/3  +  .  .  .  +  2//V-2  +  4/yv-l  +^v)  + 


5 ' 

(3.23) 


This  expression  is  exact  for  polynomials  of  degree  n  <  3  since  the  first  term  in 
the  error  expansion  involves  the  fourth  derivative.  Hence,  whenever  the  integrand  is 
satisfactorily  reproduceable  by  a  polynomial  of  degree  three  or  less,  the  Simpson 
rule  might  give  almost  exact  estimates,  independent  of  the  discretization  h. 
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The  arguments  applied  above  allow  for  a  straightforward  extension  to  four-  or 
even  more-point  rules.  We  find,  for  instance, 

rXi+ 3  3  h 

/  d xf(x)  =  —  (fi  +  3/J+i  +  3^+2  +/+3)  +  ),  (3.24) 

«/  JCj 

which  is  usually  called  Simpson’s  three-eight  rule. 

It  is  important  to  note  that  all  the  methods  discussed  so  far  are  special  cases  of 
a  more  general  formulation,  the  Newton-Cotes  rules  [2]  which  will  be  discussed 
in  the  next  section. 


3.5  General  Formulation:  The  Newton-Cotes  Rules 

We  define  the  Lagrange  interpolating  polynomial  pn~\(x)  of  degree  n—  1  [3-5] 
to  a  function/(v)  as 


Pn-l(x)  =  ^fjLf  A*), 
7=1 


where 


iT’M = ri 


k=  1 
k^j 


x-xk 

Xj  -  xk ' 


(3.25) 


(3.26) 


An  arbitrary  smooth  function  f(x)  can  then  be  expressed  with  the  help  of  a 
Lagrange  polynomial  of  degree  n  by 


fix)  =Pn- l(x)  + 


f(n>[Ux)] 

n\ 


(v  —  x\)(x  —  X2)  . . .  (x  —  xn ). 


(3.27) 


If  we  neglect  the  second  term  on  the  right  hand  side  of  this  equation  and  integrate 
the  Lagrange  polynomial  of  degree  n  —  1  over  the  n  grid-points  from  x\  to  xn  we 
obtain  the  closed  n- point  Newton-Cotes  formulas.  For  instance,  if  we  set  n  —  2, 


2The  Lagrange  polynomial  pn-\  (x)  to  the  function /(x)  is  the  polynomial  of  degree  n  —  1  that 
satisfies  the  n  equations pn—\  (xj)  =  f(xj )  for  j  =  1 , ,n,  where  Xj  denotes  arbitrary  but  distinct 
grid-points. 
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then 


Pi(x)  =fiL[l\x)  +f2L(2l)(x) 


X-X2 
X\  —X2 


+h 


X  —  X\ 
X2  ~  X\ 


=  7  [x(fl  - fl )  -  Xif2  +  X2fl]  , 

h 


(3.28) 


with/i  =  f{x i)  and/2  =  f{x 2).  Integration  over  the  respective  interval  yields 


dx  p  1  (x)  =  - 


1 

h 


rv2 


y(/2  -/1)  +  *(*2/1  -*1/2) 


2  I/2  +/i]  , 


*2 


XI 


(3.29) 


which  is  exactly  the  trapezoidal  rule.  By  setting  n  —  3  one  obtains  Simpson’s  rule 
and  setting  n  —  4  gives  the  Simpson’s  three-eight  rule. 

The  open  Newton-Cotes  rule  can  be  obtained  by  integrating  the  polynomial 
pn- i(v)  of  degree  n  —  1  which  includes  the  grid-points  x\, ...  ,xn  from  xo  to  xn+\. 
The  fact  that  these  relations  are  open  means  that  the  function  values  at  the  boundary 
points  vo  —  x\  —  h  and  *„+i  =  xn  +  h  do  not  enter  the  final  expressions.  The 
simplest  open  Newton-Cotes  formula  is  the  central  integral  approximation  which 
we  encountered  as  the  rectangular  rule  (3.14).  A  second  order  approximation  is 
easily  found  with  help  of  the  two-point  LAGRANGE  polynomial  (3.28) 


Axpiix)  =  - 
h 

_  3  h 

~  T 


y  (fi  ~fi)  +  x(x2fi  ~xif2) 
\fi  +/i]  • 


^3 


X0 


(3.30) 


Higher  order  approximations  can  be  obtained  in  a  similar  fashion.  To  conclude 
this  section  let  us  briefly  discuss  an  idea  which  is  referred  to  as  Romberg’s 
method  [6]. 

So  far,  we  approximated  all  integrals  by  expressions  of  the  form 


/  =  JN  +  0(tin), 


(3.31) 


where  I  is  the  exact,  unknown,  value  of  the  integral,  J?N  is  the  estimate  obtained 
from  an  integration  scheme  using  N  grid-points,  and  m  is  the  leading  order  of  the 
error.  Let  us  review  the  error  of  the  trapezoidal  approximation:  we  learned  that  the 
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error  for  the  integral  over  the  interval  [xt,Xi+ 1]  scales  like  h3.  Since  we  have  N  such 
intervals,  we  conclude  that  the  total  error  behaves  like  ( b  —  a)h2.  Similarly,  the  error 
of  the  three-point  SIMPSON  rule  is  for  each  sub-interval  proportional  to  h5  and  this 
gives  in  total  ( b  —  a)h4 .  We  assume  that  this  trend  can  be  generalized  and  conclude 
that  the  error  of  an  n- point  method  with  the  estimate  J?n  behaves  like  h2n~2.  Since, 
h  oc  N~l  we  have 


'  =  <  +  032) 

where  Cn  depends  on  the  number  of  grid-points  N.  Let  us  double  the  amount  of 
grid-points  and  we  obtain: 


(3.33) 


Obviously,  Eqs.  (3.32)  and  (3.33)  can  be  regarded  as  a  linear  system  of  equations  in 
/  and  C  if  Cn  &  C2N  ^  C.  Solving  Eqs.  (3.32)  and  (3.33)  for  /  yields 


1  *  (4"_1  ^  -  SO)  ■  (3-34) 

It  has  to  be  emphasized  that  in  the  above  expression  I  is  no  longer  the  exact  value 
because  of  the  approximation  Q y  &  C.  However,  it  is  an  improvement  of  the 
solution  and  it  is  possible  to  demonstrate  that  this  new  estimate  is  exactly  the  value 
one  would  have  obtained  with  an  integral  approximation  of  order  n  +  1  and  2N 
grid-points!  Thus 


^<+1  =  (4"“‘  ~  Jn  )  ■  (3-35) 

This  suggests  a  very  elegant  and  rapid  procedure:  We  simply  calculate  the  integrals 
using  two  point  rules  and  add  the  results  according  to  Eq.  (3.35)  to  obtain  more-point 
results.  For  instance,  calculate  J?2  and  J^24,  add  these  according  to  Eq.  (3.35)  and 
get  y4.  Now  calculate  ^28,  add  ^24,  get  ^38,  add  J^34  and  get  ^48.  This  pyramid-like 
procedure  can  be  continued  until  convergence  is  achieved,  that  is  |  1  <  6 

where  e  >  0  can  be  chosen  arbitrarily.  An  illustration  of  this  method  is  given  in 
Fig.  3.4. 
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Fig.  3.4  Illustration  of  the  Romberg  method.  Here,  the  are  synonyms  for  integrals 

where  the  first  index  m  refers  to  the  order  of  the  quadrature  while  the  second  index  n  refers  to  the 
number  of  grid-points  used.  Note  that  we  only  have  to  use  a  second  order  integration  scheme  ( left 
row  inside  the  box),  all  other  values  are  determined  via  Eq.  (3.35)  as  indicated  by  the  arrows 


3.6  Gauss-Legendre  Quadrature 

In  preparation  for  the  Gauss-Legendre  quadrature  we  introduce  a  set  of  orthogo¬ 
nal  Legendre  polynomials  Pi(x)  [3, 4, 7,  8]  which  are  solutions  of  the  Legendre 
differential  equation 

(1  -  X2)  P"(x)  -  2xP[ (x)  +  1(1  +  1  )Pt(x)  =  0.  (3.36) 

This  equation  occurs,  for  instance,  when  the  Laplace  equation  Af(x)  =  0  is 
transformed  to  spherical  coordinates.  Here,  we  will  introduce  the  most  important 
properties  of  Legendre  polynomials  which  will  be  required  for  an  understanding 
of  the  Gauss-Legendre  quadrature. 

Legendre  polynomials  are  given  by 


oo 

pe(x)  =  y^/ik,txk, 

k= 0 


where  the  coefficients  can  be  determined  recursively: 


Qk+2,1 


k(k  +  1  1) 

(*+!)(*  +  2)  U 


(3.37) 


(3.38) 


Hence,  for  even  values  of  t  the  Legendre  polynomial  involves  only  even  powers 
of  v  and  for  odd  l  only  odd  powers  of  v.  Note  also  that  according  to  Eq.  (3.38)  for 
k  >  l  the  coefficients  are  equal  to  zero  and,  thus,  it  follows  from  Eq.  (3.37)  that  the 
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Pi(x)  are  polynomials  of  degree  l.  Furthermore,  the  Legendre  polynomials  fulfill 
the  orthonomality  condition 


jC 


1  2 
d xPi(x)P^(x)  = 


21'  +  1 


where  <L  is  Kronecker’s  delta.  One  obtains,  in  particular, 


Po(x)  =  1, 


(3.39) 


(3.40) 


and 


P  i  (v)  =  x  . 


(3.41) 


Another  convenient  way  to  calculate  Legendre  polynomials  is  based  on 
Rodrigues’  formula 


(3.42) 


We  focus  now  on  the  core  of  the  Gauss -Legendre  quadrature  and  introduce 
the  function  F(x)  as  a  transform  of  the  function /(v) 


b  —  a  ( b  —  a  b  +  a 
F(x)  =  ——/  I  + 


(3.43) 


in  such  a  way  that  we  can  rewrite  the  integral  of  interest  as: 


r 

J  a 


d xf(x) 


-i: 


dxF(x) 


(3.44) 


If  the  function  F(x)  can  be  well  approximated  by  some  polynomial  of  degree  2^—1 
like 


F(X)  %  P2n-\(X)  , 


(3.45) 


then  this  means  that  according  to  Taylor’s  theorem  (2.7)  the  error  introduced 
by  this  approximation  is  proportional  to  F^ln\x).  If  the  polynomial  pm-iix)  is 
explicitly  given  then  we  can  apply  the  methods  discussed  in  the  previous  sections 
to  approximate  the  integral  (3.44).  However,  even  if  the  polynomial  is  not  explicitly 
given  we  write  the  integral  (3.44)  as 


/I  n 

d xF(x)  =  ^2  ViFixi)  , 

_1  i=  1 


(3.46) 
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with  weights  cot  and  grid-points  Xj,  i  —  1  which  are  yet  undetermined! 

Therefore,  we  will  determine  the  weights  and  grid-points  jq  in  such  a  way,  that  the 
integral  is  well  approximated  even  if  the  polynomial p2n-\  in  Eq.  (3.45)  is  unknown. 
For  this  purpose  we  decompose P2n-i  (x)  into 

P2n-i(x)  =  pn-i(x)Pn(x)  +  qn-i(x)  ,  (3.47) 

where  Pn(x)  is  the  Legendre  polynomial  of  degree  n  and  pn-\ (v)  and  qn-\  (v)  are 
polynomials  of  degree  n  —  1.  Since  pn-\  (v)  itself  is  a  polynomial  of  degree  n  —  1,  it 
can  also  be  expanded  in  Legendre  polynomials  of  degrees  up  to  n  —  1  by 


/7— i 

Pn-lix)  =  T, ajPi(x)  .  (3.48) 

i=0 

Using  Eq.  (3.48)  in  (3.47)  we  obtain  together  with  normalization  relation  (3.39) 

/I  ft— 1  />  1  n  1  n  1 

&xp2n-i(x)  =  /  dxPj(x)Pn(x)  +  /  dxqn-i(x)  =  /  &xqn-i(x)  . 

(3.49) 

Moreover,  since  Pn(x)  is  a  Legendre  polynomial  of  degree  n  it  has  n- zeros  in  the 
interval  [— 1,  1]  and  Eq.  (3.47)  results  in 

Pin-\(xi)  =  qn-\(xi)  ,  (3.50) 

where  x\ ,  V2 , . . . ,  xn  denote  the  zeros  of  Pn  (v)  and  these  zeros  determine  the  grid- 
points  of  our  integration  routine.  It  is  interesting  to  note,  that  these  zeros  are 
independent  of  the  function  F(x)  we  want  to  integrate.  We  also  expand  qn~\(x )  in 
terms  of  Legendre  polynomials 


«— i 

qn-i(x)  =  T,  bjPj(x)  ,  (3.51) 

i=0 


and  use  it  in  Eq.  (3.50)  to  obtain 


«— l 

Pin—]  (Xi)  =  23  bk Pk(Xi)  ,  *  =  1 ,  •  •  •  ,  tl  , 

k= 0 


(3.52) 
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which  can  be  written  in  a  more  compact  form  by  defining  p,  =  p2n-\ Or, )  and  Pm  = 
Pk(x,y. 


n —  1 

Pi  —  J2  bkPki  >  *  —  1  ’  •  •  •  > n  ■  (3.53) 

k= 0 


It  has  to  be  emphasized  again  that  the  grid-points  X(  are  independent  of  the 
polynomial  P2n-\(x)  and,  therefore,  independent  of  F(x).  Furthermore,  we  can 
replace  pt  &  F(xi )  =  Ft  according  to  Eq.  (3.45).  We  recognize  that  Eq.  (3.53) 
corresponds  to  a  system  of  linear  equations  which  can  be  solved  for  the  weights 
bk.  We  obtain 


n 


k  =  EMr']. 


i=  1 


(3.54) 


where  P  is  the  matrix  P  =  {Py},  which  is  known  to  be  non- singular.  We  can  now 
rewrite  the  integral  (3.44)  with  the  help  of  Eqs.  (3.45),  (3.49),  and  (3.51)  together 
with  the  properties  of  the  zeros  of  Legendre  polynomials  [7,  8]  as 


/I  n\  n-\ 

dxF(x)  ss  /  &xp2n-\(x)  =  y^bk  /  d xPk(x) 

-i  J- 1  k=0  J- 1 


(3.55) 


Since  Pq (x)  =  1  according  to  Eq.  (3.40),  we  deduce  from  Eq.  (3.39) 


d xPk(x) 


-i 


dxPk(x)P0(x)  = 


2k  +  1 


ho  —  2ho 


Hence,  Eq.  (3.55)  reads 


(3.56) 


/i  n 

d xF(x)  «  2/?0  =  2  y^Fi  [P_1]/0 
-1  /=! 


By  defining 


U); 


=  2  [P_1](0  , 


we  arrive  at  the  desired  expansion 


/I  n 

dvE(v)  %  ^  cOjFj 
_1  i=  l 


(3.57) 


(3.58) 


(3.59) 
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Moreover,  since  we  approximated  F(x )  by  a  polynomial  of  degree  2n  —  1,  the 
Gauss-Legendre  quadrature  is  exact  for  polynomials  of  degree  2n  —  1,  i.e. 
the  error  is  proportional  to  a  derivative  of  F(x)  of  order  2 n.  Furthermore,  expres¬ 
sion  (3.58)  can  be  put  in  a  more  convenient  form.  One  can  show  that 


where 


(3.60) 


(3.61) 


Let  us  make  some  concluding  remarks.  The  grid-points  X[  as  well  as  the  weights 
C0i  are  independent  of  the  actual  function  F(x)  we  want  to  integrate.  This  means, 
that  one  can  table  these  values  once  and  for  all  [7,  8]  and  use  them  for  different 
types  of  problems.  The  grid-points  Xi  are  symmetrically  distributed  around  the 
point  v  =  0,  i.e.  for  every  xj  there  is  a  —Xj.  Furthermore,  these  two  grid-points 
have  the  same  weight  coj.  The  density  of  grid-points  increases  approaching  the 
boundary,  however,  the  boundary  points  themselves  are  not  included,  which  means 
that  the  Gauss-Legendre  quadrature  is  an  open  method.  Furthermore,  it  has  to  be 
emphasized  that  low  order  Gauss-Legendre  parameters  can  easily  be  calculated 
by  employing  relation  (3.42).  This  makes  the  Gauss-Legendre  quadrature  the 
predominant  integration  method.  In  comparison  to  the  trapezoidal  rule  or  even 
the  Romberg  method,  it  needs  in  many  cases  a  smaller  number  of  grid-points, 
is  simpler  to  implement,  converges  faster  and  yields  more  accurate  results.  One 
drawback  of  this  method  is  that  one  has  to  compute  the  function  F(x)  at  the  zeros  of 
the  Legendre  polynomial  jq-.  This  can  be  a  problem  if  the  integrand  at  hand  is  not 
known  analytically. 

It  is  important  to  note  at  this  point  that  comparable  procedures  exist  which 
use  other  types  of  orthogonal  polynomials,  such  as  Hermite  polynomials.  This 
procedure  is  known  as  the  Gauss -Hermite  quadrature. 

Table  3.1  lists  the  methods,  discussed  in  the  previous  sections,  which  allow  to 
calculate  numerically  an  estimate  of  integrals  of  the  form: 

b 

d xf(x)  .  (3.62) 

Equal  grid-spacing  h  is  assumed,  with  the  Gauss-Legendre  method  as  the  only 
exception.  The  particular  value  of  h  depends  on  the  order  of  the  method  employed 
and  is  given  in  Table  3.1. 


46 


3  Numerical  Integration 


Table  3.1  Summary  of  the  quadrature  methods  discussed  in  this  chapter  applied  to  the  integral 

nb 

Ja  d xf(x).  For  a  detailed  description  consult  the  corresponding  sections.  Equal  grid-spacing  is 
assumed  for  all  methods  except  for  the  Gauss-Legendre  quadrature.  The  explicit  values  of  h 
depend  on  the  order  of  the  method  and  are  listed  in  the  table.  Furthermore,  we  use  jc,-  =  a  +  ih  and 
denote /(x/)  =  f.  The  function  P(m\x)  which  appears  in  the  description  of  the  Newton-Cotes 
rules  denotes  the  ra-th  order  LAGRANGE  interpolating  polynomial  and  Pm  (x)  is  the  ra-th  degree 
Legendre  polynomial 


n 

h 

y 

Method 

Comment 

1 

b—a 

2 

hf i 

Rectangular 

Open 

2 

b  —  a 

|  (fo  +/l) 

Trapezoidal 

Closed 

3 

b — a 

2 

|  (fo  +  4/,  +/2) 

Simpson 

Closed 

4 

b — a 

3 

f  (fo  +  3/,  +  3/2  +/3) 

Simpson  | 

Closed 

ra 

b — a 

m  —  1 

XT'  dxP{m>(x) 

Newton-Cotes 

Closed 

m 

b—a 

m+1 

f*"+l  d xP(m\x) 

Newton-Cotes 

Open 

m 

Pm(Xj)  =  0 

-  —  a+b  ,  a— b 

*■]  2  1  2  / 

_  2 

COi  r  -,2 

(1  -Xj)2\P'm(Xj)\ 

Gauss-Legendre 

Open 

3.7  An  Example 


Let  us  discuss  as  an  example  the  following  proper  integral: 


dx 

x  H-  2 


=  ln(3)  —  ln(l)  «  1.09861  . 


(3.63) 


We  will  now  apply  the  various  methods  of  Table  3.1  to  approximate  Eq.  (3.63).  Note 
that  these  methods  could  give  better  results  if  a  finer  grid  had  been  chosen.  However, 
since  this  is  only  an  illustrative  example,  we  wanted  to  keep  it  as  simple  as  possible. 
The  rectangular  rule  gives 


^  1 

—  1  •  —  —  0.5  , 
2 


the  trapezoidal  rule 


(3.64) 


4 

—  =  1 .333  . . .  , 
3 


and  an  application  of  the  SIMPSON  rule  yields 


10 


1.111  ...  . 


(3.65) 


9 


(3.66) 
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Finally,  we  apply  the  Gauss -Legendre  quadrature  in  a  second  order  approxima¬ 
tion.  We  could  look  up  the  parameters  in  [7,  8],  however,  for  illustrative  reasons  we 
will  calculate  those  in  this  simple  case.  For  a  second  order  approximation  we  need 
the  Legendre  polynomial  of  second  degree.  It  can  be  obtained  from  Rodrigues’ 
formula  (3.42): 


P2(x)  = 


1  d2 


222!  dx2 
I  (3x2  —  l) 


U  - 1): 


(3.67) 


In  a  next  step  the  zeros  x\  and  V2  of  P2 (v)  are  determined  from  Eq.  (3.67)  which 
results  immediately  in: 


1 

X\2  —  — —  &  =b0. 57735  . 

73 

The  weights  co\  and  C02  can  now  be  evaluated  according  to  Eq.  (3.60): 

2 


(1  -x;)[P',(x,)]2 

It  follows  from  Eq.  (3.67)  that 

Pr2  (v)  =  3x  , 


(3.68) 


(3.69) 


(3.70) 


and,  thus, 


P2(x  1)  =  -V3  and  P2(x2)  =  A  . 
This  is  used  to  calculate  the  weights  from  Eq.  (3.69): 


(3.71) 


CO  1  —  CO2  —  1  . 


(3.72) 


We  combine  the  results  (3.68)  and  (3.72)  to  arrive  at  the  Gauss-Legendre 
estimate  of  the  integral  (3.63): 


1 


+ 


1 


— L  _ l  ?  _L  +  2 

V3  ^  z  ^  z 


1.090909 


(3.73) 


Obviously,  a  second  order  Gauss-Legendre  approximation  results  already  in  a 
much  better  estimate  of  the  integral  (3.63)  than  the  trapezoidal  rule  which  is  also 
of  second  order.  It  is  also  better  than  the  estimate  by  the  Simpson  rule  which  is  of 
third  order. 
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3.8  Concluding  Discussion 


Let  us  briefly  discuss  some  further  aspects  of  numerical  integration.  In  many  cases 
one  is  confronted  with  improper  integrals  of  the  form 

p  OO  pa  p  OO 

/  dxf(x),  /  d xf(x),  or  /  d xf(x)  .  (3.74) 

J  a  J — oo  J — oo 

The  question  arises  whether  or  not  we  can  treat  such  an  integral  with  the  methods 
discussed  so  far.  The  answer  is  yes,  it  is  possible  as  we  will  demonstrate  using  the 
integral 


(3.75) 


as  an  example;  other  integrals  can  be  treated  in  a  similar  fashion.  We  rewrite 
Eq.  (3.75)  as 


lim  1(b)  . 


(3.76) 


One  now  calculates  I(b\)  for  some  b\  >  a  and  I(b2)  for  some  Z?2  >  b\.  If  \I{bi)  — 
1(h)  |  <  €,  where  e  >  0  is  the  required  accuracy,  the  resulting  value  /(Z?2)  can 
be  regarded  as  the  appropriate  estimate  to  I.  However,  in  many  cases  it  is  easier 
to  perform  an  integral  transform  in  order  to  map  the  infinite  interval  onto  a  finite 
interval.  For  instance,  consider  [9] 

f°°  1 

I  =  /  dx - T  .  (3.77) 

(1  +  x2) 3 


The  transformation 


1 

t  =  - 

1  T-  x 


(3.78) 


gives 


d  t 


2 

1 3 


[t2  +  (i-02F  ’ 


(3.79) 


3 Particular  care  is  required  when  dealing  with  periodic  functions! 
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Thus,  we  mapped  the  interval  [0,  oo)  ->  [0,1].  Integral  (3.79)  can  now  be 
approximated  with  help  of  the  methods  discussed  in  the  previous  sections.  These  can 
also  be  applied  to  approximate  convergent  integrals  whose  integrand  shows  singular 
behavior  within  [a,  b\. 

If  the  integrand/ (v)  is  not  smooth  within  the  interval  I :  x  e  [a,  b]  we  can  split  the 
total  integral  into  a  sum  over  sub-intervals.  For  instance,  if  we  consider  the  function 


v  cos(v),  v  <  0  , 
v  sin(v),  v  >  0  , 


we  can  calculate  the  integral  over  the  interval  I :  x  e  [—10, 10]  as 

/10  s*0  n  10 

d xf(x)  —  /  dxvcos(v)  +  /  dxvsin(v)  . 

-io  7-io  Jo 

We  generalize  this  result  and  write 

(3.80) 

with  sub-intervals  4  c  I,  Wk  and  the  integrand/(v)  is  assumed  to  be  smooth  within 
each  sub-interval  4  but  not  necessarily  within  the  interval  I.  We  can  then  apply  one 
of  the  methods  discussed  in  this  chapter  to  calculate  an  estimate  of  the  integral  over 
any  of  the  sub-intervals  4- 

Similar  to  the  discussion  in  Sect.  2.5  about  the  approximation  of  partial  deriva¬ 
tives  on  the  basis  of  finite  differences,  one  can  apply  the  rules  of  quadrature 
developed  here  for  different  dimensions  to  obtain  an  estimate  of  multi-dimensional 
integrals.  However,  the  complexity  of  the  problem  is  significantly  increased  if  the 
integration  boundaries  are  functions  of  the  variables  rather  than  constants.  For 
instance, 


nb  p<f)2(x ) 

/  dx  /  d yf(x,y)  .  (3.81) 

Ja  J  (p\{x) 

Such  cases  are  rather  difficult  to  handle  and  the  method  to  choose  depends  highly 
on  the  form  of  the  functions  cp\  (v),  (p2(x)  and  f(x,  y).  We  will  not  deal  with  integrals 
of  this  kind  because  this  is  beyond  the  scope  of  this  book.  The  interested  reader  is 
referred  to  books  by  Dahlquist  and  Bjork  [10]  and  by  Press  et  al.  [11]. 

In  a  final  remark  we  would  like  to  point  out  that  it  can  be  of  advantage  to  utilize 
the  properties  of  Fourier  transforms  when  integrals  of  the  convolution  type  are  to 
be  approximated  numerically  (see  Appendix  D). 
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Summary 

The  starting  point  was  the  concept  of  finite  differences  (Sect.  2.2).  Based  on  this 
concept  proper  integrals  over  smooth  functions  f(x)  were  approximated  by  a  sum 
over  elemental  areas  with  the  elemental  area  defined  as  the  area  under  f(x)  between 
two  consecutive  grid-points.  The  simplest  method,  the  rectangular  rule ,  was  based 
on  forward/backward  differences.  It  was  a  closed  method,  i.e.  the  functional  values 
at  the  boundaries  were  included.  On  the  other  hand,  a  rectangular  rule  based  on 
central  differences  was  an  open  method,  i.e.  the  functional  values  at  the  boundaries 
were  not  included.  Application  of  the  Taylor  expansion  (2.7)  revealed  that  the 
methodological  error  of  the  rectangular  rule  was  of  order  &(h2).  With  the  elemental 
area  approximated  by  a  trapezoid  we  arrived  at  the  trapezoidal  rule .  It  was  a  closed 
method  and  the  methodological  error  was  of  order  h 3).  The  inclusion  of  higher 
order  derivatives  of f(x)  allowed  the  derivation  of  the  Simpson  rules  of  quadrature. 
They  resulted  a  remarkable  reduction  of  the  methodological  error.  A  more  general 
formulation  of  all  these  methods  was  based  on  the  interpolation  of  the  function/(v) 
using  Lagrange  interpolating  polynomials  of  degree  n  and  resulted  in  the  class 
of  Newton-Cotes  rules.  For  various  orders  of  n  of  the  interpolating  polynomial 
all  the  above  rules  were  derived.  Within  this  context  a  particularly  useful  method, 
the  Romberg  method,  was  discussed.  By  adding  diligently  only  two-point  rules 
the  error  of  the  numerical  estimate  of  the  integral  has  been  made  arbitrarily  small. 
An  even  more  general  approach  was  offered  by  the  Gauss -Legendre  quadrature 
which  used  Legendre  polynomials  of  degree  l  to  approximate  the  function /(v). 
The  grid-points  were  defined  by  the  zeros  of  the  f-th  degree  polynomial  and  the 
weights  cot  in  Eq.  (3.1)  were  proportional  to  the  square  of  the  inverse  first  derivative 
of  the  polynomial.  This  method  had  the  enormous  advantage  that  the  grid-points 
and  weights  were  independent  of  the  function /(v)  and,  thus,  could  be  determined 
once  and  for  all  for  any  polynomial  degree  l.  Error  analysis  proved  that  this  method 
had  the  smallest  methodological  error. 


Problems 


We  consider  the  interval  I  —  [—5,  5]  together  with  the  functions  g(x)  and  h(x): 

g(x)  =  exp  (—v2)  and  h(x )  =  sin(v)  . 

We  discretize  the  interval  I  by  introducing  N  equally  spaced  grid-points.  The 
corresponding  N—  1  sub-intervals  are  denoted  by  IjJ  —  1 , . . .  N—  1 .  In  the  following 
we  wish  to  calculate  estimates  of  the  integrals 


and 
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Furthermore,  we  add  a  third  integral  of  the  form 


to  our  discussion. 

1.  Evaluate  with  the  help  of  the  error  function  erf(v),  which  should  be  supplied 
by  the  environment  you  use  as  an  intrinsic  function.  Note  that  the  error  function 
is  defined  as 


erf(v)  =  — [  dzexp(-z2).  (3.82) 

V71  Jo 

Hence  you  should  be  able  to  express  in  terms  of  erf(v). 

2.  Calculate  J^2  and  analytically. 

3.  In  order  to  approximate  J^i,  J^2  and  ^3  with  the  help  of  the  two  second  order 
methods  we  discussed  in  this  chapter,  employ  the  following  strategy:  First  the 
integrals  are  rewritten  as 


where  •  is  a  placeholder  for  g(v),  h(x )  and  h2(x)  and  /*,  i  =  1 , . . . ,  N  are  suitable 
intervals.  In  a  second  step  the  integrals  are  approximated  with  (i)  the  central 
rectangular  rule  and  (ii)  the  trapezoidal  rule. 

4.  In  addition,  we  approximate  the  integrals  ^1,  J?2  and  by  employing  Simp¬ 
son’s  rule  for  odd  N.  Here 


fdx  ■  =  I 

J 1  Jii  u  i2 


dx  •  + 


/ 

2/3  U/4 


dx  •  -p  ...  -p 


/ 


d.v 


t-N — 2  — 1 


is  used  as  it  was  discussed  in  Sect.  3.4. 

5.  Compare  the  results  obtained  with  different  algorithms  and  different  numbers  of 
grid-points,  N.  Plot  the  absolute  and  the  relative  error  as  a  function  of  N. 

6.  Approximate  numerically  the  integral 


7.  Calculate  the  integral  over  the  function 


y 


f(x,  v)  =  exp 


2 


cos(v) 
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within  the  intervals  v  e  [—10 n,  10tt]  and  y  e  [—10, 10]  with  the  help  of  an 
approximation  of  your  choice. 

8.  Demonstrate  numerically  that  the  line  integral  over  closed  loops  ^  of  the 
function/(v,  y)  of  the  previous  problem  vanishes: 

C t >  d s-Vf(x,y)  =  0. 
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Chapter  4 

The  Kepler  Problem 


4.1  Introduction 

The  Kepler  problem  [1-6]  is  certainly  one  of  the  most  important  problems  in  the 
history  of  physics  and  natural  sciences  in  general.  We  will  study  this  problem  for 
several  reasons:  (i)  it  is  a  nice  demonstration  of  the  applicability  of  the  methods 
introduced  in  the  previous  chapters,  (ii)  important  concepts  of  the  numerical 
treatment  of  ordinary  differential  equations  can  be  introduced  quite  naturally,  and 
(iii)  it  allows  to  revisit  some  of  the  most  important  aspects  of  classical  mechanics. 

The  Kepler  problem  is  a  special  case  of  the  two-body  problem  which  is 
discussed  in  Appendix  A.  Let  us  summarize  the  main  results.  We  consider  two 
point  particles  interacting  via  the  rotationally  symmetric  two  body  potential  U 
which  is  solely  a  function  of  the  distance  between  the  particles.  The  symmetries 
of  this  problem  allow  several  simplifications:  (i)  The  problem  can  be  reduced  to 
the  two  dimensional  motion  of  a  point  particle  with  reduced  mass  m  in  the  central 
potential  U.  (ii)  By  construction,  the  total  energy  E  is  conserved,  (iii)  The  length  l 
of  the  angular  momentum  vector  is  also  conserved  because  of  the  symmetry  of  the 
potential  U.  Due  to  this  rotational  symmetry  it  is  a  natural  choice  to  describe  the 
particle’s  motion  in  polar  coordinates  (p,  cp). 

The  final  differential  equations  which  have  to  be  solved  are  of  the  form 


mp~ 


and 


/  2 

t2  1 

V  m 

E  —  U(p)  —  —  - 

2  mpz 

(4.2) 
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Here,  one  usually  defines  the  effective  potential 

l2 

UeS(p)  =  U(p)  +  - — - ,  (4.3) 

2  mpz 

as  the  sum  of  the  interaction  potential  and  the  centrifugal  barrier  Umom(p )  = 
l2 /Imp1.  Equation  (4.2)  can  be  transformed  into  an  implicit  equation  for  p 


f  dp'  \-[E-  UeS(p')] 
J  po  (  m 


(4.4) 


with  po  =  pit o)  the  initial  condition  at  time  to.  Furthermore,  the  angle  ip  is  related 
to  the  radius  p  by 


(p  =  (po  ±  f  dp' 

Jpo 


1  {-[E-  Ueff(p')] 


mp'2  (  m 


(4.5) 


with  the  initial  condition  po  =  <p(fo)- 

The  Kepler  problem  is  defined  by  the  gravitational  interaction  potential 


a 


U{p)  — - ,  a  >  0. 

P 


(4.6) 


For  this  case,  we  show  in  Fig.  4.1  schematically  the  effective  potential  (4.3)  (solid 
black  line),  together  with  the  gravitational  potential  U(p)  (dashed-dotted  line)  and 
the  centrifugal  barrier  Umom  (dashed  line).  The  gravitational  potential  (4.6)  is  now 


Fig.  4.1  Schematic  illustration  of  the  effective  potential  Feff(p)/f/eff(po)  vs  p/ Po  ( solid  line,  right 
hand  scale).  Here,  po  is  the  distance  of  the  minimum  in  t/efr(p).  Fgrav(p)  {dashed- dotted  line) 
denotes  the  gravitational  contribution  while  Umom(p)  {dashed  line)  denotes  the  centrifugal  barrier. 
Both  potentials  are  normalized  to  Feff  (po)  {Left  hand  scale  applies) 
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inserted  into  Eq.  (4.5): 


ip  =  <p0±  f  dp' 

Jpo 


l 


mp 


a 


2  (  a  t2 

—  (  ^  H - 7  —  77 - 77 

m  \  p  2 mp'z 


1 

2 


(4.7) 


The  substitution  u  —  1  / p  simplifies  Eq.  (4.7)  to 


<P  -  <Po  T 


rui , 

2mE 

2  ma  2 

/  d  u 

1 

u  —  u 

1  U\ 

i2 

l2  J 

1 

2 


(4.8) 


where  the  integration  boundaries  u\  and  U2  are  l/po  and  1/p,  respectively.  The 
integral  can  now  be  evaluated  with  the  help  of  a  simple  substitution  and  we  obtain 
the  angle  <p  as  a  function  of  p: 


cp  —  (p{)  =1=  cos 


-1 


i 

p 


ma 


\ 


+  2^  j 


+  const 


(4.9) 


This  solution  can  conveniently  be  characterized  by  the  introduction  of  two  parame¬ 
ters,  namely 


a  — 


ma 


(4.10) 


and  the  eccentricity  e 


e=Jl  + 


2  El2 


ma 


(4.11) 


Hence,  by  neglecting  the  integration  constant  and  setting  <po  =  0  we  arrive  at 

—  —  l  e  cos  (<p)  (4.12) 

P 

as  the  final  form  of  Eq.  (4.9).  It  describes  for  e  >  la  hyperbola,  for  e  —  la  parabola, 
and  for  e  <  1  an  ellipse.  The  case  e  =  0  is  a  special  case  of  the  ellipse  and  describes 
a  circle  with  radius  p  —  a.  A  more  detailed  discussion  of  this  result,  in  particular  the 


1  In  particular,  we  substitute 


w  = 


m2a2  \  2 

) 
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derivation  of  Kepler’s  laws  can  be  found  in  any  textbook  on  classical  mechanics 
[1-6].  We  discuss  now  some  numerical  aspects. 


4.2  Numerical  Treatment 

In  the  previous  section  we  solved  the  Kepler  problem  by  evaluating  the  inte¬ 
grand  (4.7)  expressing  the  angle  <p  as  a  function  of  the  radius  p.  However,  in  this 
section  we  aim  at  solving  the  integral  equation  (4.4)  numerically  with  the  help  of 
the  methods  discussed  in  the  previous  chapter.  Remember  that  Eq.  (4.4)  expresses 
the  time  t  as  a  function  of  the  radius  p.  This  equation  has  to  be  inverted,  in  order  to 
obtain  p(t),  which,  in  turn,  is  then  inserted  into  Eq.  (4.1)  in  order  to  determine  the 
angle  cp(t)  as  a  function  of  time.  This  discussion  will  lead  us  in  a  natural  way  to  the 
most  common  techniques  applied  to  solve  ordinary  differential  equations,  which  is 
of  no  surprise  since  Eq.  (4.4)  is  the  integral  representation  of  Eq.  (4.2). 

We  give  a  short  outline  of  what  we  plan  to  do:  We  discretize  the  time  axis  in 
equally  spaced  time  steps  At ,  i.e.  tn  =  to  +  nAt.  Accordingly,  we  define  the  radius 
p  at  time  t  —  tn  as  p(tn)  =  pn.  We  can  use  the  methods  introduced  in  Chap.  3  to 
approximate  the  integral  (4.4)  from  some  pn  to  pn+ 1-  According  to  this  chapter  the 
absolute  error  introduced  will  behave  like  8  —  | pn  —  pn+\  \K  where  the  explicit  value 
of  K  depends  on  the  method  used.  However,  since  the  radius  p  changes  continuously 
with  time  t  we  know  that  for  sufficiently  small  values  of  At  the  error  8  will  also 
become  arbitrarily  small.  If  we  start  from  some  initial  values  to  and  po,  we  can 
successively  calculate  the  values  pi,  p2  ,  . . . ,  by  applying  a  small  time  step  At. 

Let  us  start  by  rewriting  Eq.  (4.4)  as: 

t-  t0  =  fdp'/(pV  (4-13) 

Jpo 

As  we  discretized  the  time  axis  in  equally  spaced  increments  and  defined  pn  =  p(tn ), 
we  can  rewrite  (4.13)  as 


Pn  + 1 


(4.14) 


Pn 


The  forward  rectangular  rule,  (3.9),  results  in  the  approximation 


At  —  (pw_|_i  Pn)f(Pn)  • 


(4-15) 
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We  solve  this  equation  for  pn+  \  and  obtain 

Pn+ 1  =  h(pn)At  +  pn  ,  (4.16) 


where  we  defined 


hip)  =  -f-  =  J-  [E-  UeS(p )]  ,  (4.17) 

/(P)  V  m 

following  Eqs.  (4.2)  and  (4.3).  As  Eq.  (4.4)  is  the  integral  representation  of  the  ordi¬ 
nary  differential  equation  (4.2),  approximation  (4.16)  corresponds  to  the  approxi¬ 
mation 


D+pn  =  h(pn)  ,  (4.18) 

where  D+pn  is  the  forward  difference  derivative  (2.10a).  Since  the  left  hand  side 
of  the  discretized  differential  equation  (4.18)  is  independent  of  pn+ 1,  this  method 
is  referred  to  as  an  explicit  method.  In  particular,  consider  an  ordinary  differential 
equation  of  the  form 


y  =  F(y).  (4.19) 

Then  the  approximation  method  is  referred  to  as  an  explicit  Euler  method  if  it  is 
of  the  form 


yn+i  =yn+  F(y„)At . 


(4.20) 


Note  that  y  might  be  a  vector. 

Let  us  use  the  backward  rectangular  rule  (3.10)  to  solve  Eq.  (4.14).  We  obtain 

in  —  (Pn+1  Pn)f(Pn+ 1)  >  (4.21) 

or  equivalently 

Pn-\- 1  —  Pn  T  h(ypn-\.\)  /\t .  (4.22) 

Again,  this  corresponds  to  an  approximation  of  the  differential  equation  (4.2)  by 

D-pn+ 1  =  h(pn+ 1)  ,  (4.23) 

where  D_(p„+ 1)  is  the  backward  difference  derivative  (2.10b).  In  this  case  the 
quantity  of  interest  p„+i  still  appears  in  the  argument  of  the  function  h(p)  and 
Eq.  (4.22)  is  an  implicit  equation  for  pn+i  which  has  to  be  solved.  In  general,  if 
the  problem  (4.19)  is  approximated  by  an  algorithm  of  the  form 


yn+ 1  =  yn  +  F(yn+i)At  , 


(4.24) 
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it  is  referred  to  as  an  implicit  Euler  method.  Note  that  the  implicit  equation  (4.24) 
might  be  analytically  unsol vable.  Hence,  one  has  to  employ  a  numerical  method  to 
solve  (4.24)  which  will  also  imply  a  numerical  error.  However,  in  the  particular  case 
of  Eq.  (4.22)  we  can  solve  it  analytically  since  it  is  a  fourth  order  polynomial  in 
of  the  form 


2EAt2\  , 
— )  P '-+> 


2  a  At2 

- Pn+ 1  + 

m 


l2  At2 
m2 


(4.25) 


The  solution  of  this  equation  is  quite  tedious  and  will  not  be  discussed  here, 
however,  the  method  one  employs  is  referred  to  as  FerrarTs  method  [7]. 

A  natural  way  to  proceed  is  to  regard  the  central  rectangular  rule  (3.13)  in  a  next 
step.  Within  this  approximation  we  obtain  for  Eq.  (4.13) 

At  =  (pn+ 1  -  pn)f  >  (4.26) 

which  is  equivalent  to  the  implicit  equation 

Pn+ 1  =  pn  +  h  At  .  (4.27) 


It  can  be  written  as  an  approximation  to  Eq.  (4.2)  with  help  of  the  central  difference 
derivative  Dcpn+ 1 : 


n  j  ,  Pn+l  +  P 

DcPn+h  =  h  ( - - 


n 


In  general,  for  a  problem  of  the  form  (4.19)  a  method  of  the  form 


yn+ 1  =  yn  +  f  |  yn+l  +  yn  )  At , 


(4.28) 


(4.29) 


is  referred  to  as  the  implicit  midpoint  rule.  We  note  that  this  method  might  be  more 
accurate  since  the  error  of  the  rectangular  rule  scales  like  &(At2)  while  the  error  of 
the  forward  and  backward  rectangular  rules  scale  like  @(At).  Nevertheless,  in  case 
of  the  Kepler  problem,  one  can  solve  the  implicit  equation  (4.27)  analytically  for 
pn+ 1  which  is  certainly  of  advantage. 

In  this  chapter  the  Kepler  problem  was  instrumental  in  introducing  three 
common  methods  which  can  be  employed  to  solve  numerically  ordinary  differential 
equations  of  the  form  (4.19).  More  general  and  advanced  methods  to  solve  ordinary 
differential  equations  and  a  more  systematic  description  of  these  methods  will  be 
offered  in  the  next  chapter. 

However,  let  us  discuss  another  point  before  proceeding  to  the  chapter  on 
the  numerics  of  ordinary  differential  equations.  As  demonstrated  in  Sect.  1.3  the 
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approximation  of  the  integral  (4.4)  involves  a  numerical  error.  What  will  be  the  con¬ 
sequence  of  this  error?  Since  we  demonstrated  that  the  approximations  we  discussed 
result  in  a  differential  equation  in  finite  difference  form,  i.e.  Eqs.  (4.18),  (4.23), 
and  (4.27),  we  know  that  the  derivative  p  will  exhibit  an  error.  Consequently,  energy 
conservation  [see  Appendix  A,  Eq.  (A. 27)]  will  be  violated  with  the  implication  that 
deviations  from  the  trajectory  (4. 12)  can  be  expected.  This  is  definitely  not  desirable. 

A  solution  is  provided  by  a  special  class  of  methods,  known  as  symplectic 
integrators ,  which  were  specifically  designed  for  such  cases.  They  are  based  on 
a  formulation  of  the  problem  using  Hamilton’s  equations  of  motion.  (See,  for 
instance,  Refs.  [1-5].)  In  the  particular  case  of  the  Kepler  problem  the  Hamilton 
function  is  equivalent  to  the  total  energy  of  the  system  and  reads  (in  some  scaled 
units): 


Hip,  <l)  =  \  (pi  +  Pi) - /  •  (4-30) 

2 

Here  p  —  (p\,pi)  are  the  generalized  momentum  coordinates  of  the  point  particle 
in  the  two-dimensional  plane  and  (#i,  #2)  are  the  generalized  position  coordinates. 
From  this  Hamilton’s  equations  of  motion 

(q\  (  VPH(p,q)  \  ( a(q,p)\ 

\p)  \-VqH(p,q )J  \b(q,p)  J  ' 

follow,  where  the  functions  a(q,p )  and  b(q,p )  have  been  introduced  for  a  more 
convenient  notation.  Note  that  these  functions  are  two  dimensional  vectors  in  the 
case  of  Kepler’s  problem.  The  so  called  symplectic  Euler  method  is  given  by 

tfn+ 1  —  Qn  H-  Pn+ l)  At  , 

Pn+1  =  Pn  +  b(qn,  pn+\)At  .  (4.32) 

Obviously,  the  first  equation  is  explicit  while  the  second  is  implicit.  An  alternative 
formulation  reads 


<?«+ 1  —  Qn  T"  ttipn+l  ->Pn)At  , 

Pn+1  =  Pn  +  b(qn+i,pn)At ,  (4.33) 

where  the  first  equation  is  implicit  and  the  second  equation  is  explicit.  Of  course, 
Eq.  (4.31)  may  be  solved  with  the  help  of  the  explicit  Euler  method  (4.20),  the 
implicit  Euler  method  (4.24)  or  the  implicit  midpoint  rule  (4.29).  The  solution 
should  be  equivalent  to  solving  Eq.  (4.4)  with  the  respective  method  and  then 
calculating  (4.1)  successively.  Again,  a  more  systematic  discussion  of  symplectic 
integrators  can  be  found  in  the  following  chapters. 
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Let  us  conclude  this  chapter  with  a  final  remark.  We  decided  to  solve  Eqs.  (4.4) 
and  (4.1)  because  we  wanted  to  reproduce  the  dynamics  of  the  system,  i.e.  we 
wanted  to  obtain  p(t)  and  (p(t).  This  directed  us  to  the  numerical  solution  of 
two  integrals.  If  we  wanted  to  employ  symplectic  methods,  which  provide  several 
advantages,  we  would  have  to  solve  four  differential  equations  (4.31)  instead  of 
two  integrals.  Moreover,  if  we  are  not  interested  in  the  time  evolution  of  the 
system  but  in  the  form  of  the  trajectory  in  general,  we  could  simply  evaluate  the 
integral  (4.5)  analytically  or,  if  an  analytical  solution  is  not  feasible  for  the  potential 
U(p)  one  is  interested  in,  numerically.  Methods  to  approximate  such  an  integral 
were  extensively  discussed  in  Chap.  3. 


Summary 

Kepler’s  two-body  problem  was  used  as  an  incentive  to  introduce  intuitively 
numerical  methods  to  solve  ordinary  first  order  differential  equations.  To  serve  this 
purpose  the  basic  differential  equations  were  transformed  into  integral  form.  These 
integrals  were  then  solved  with  the  help  of  the  rules  discussed  in  Sect.  3.2.  Three 
basic  methods  have  been  identified,  namely  the  explicit  Euler  method  (based 
on  the  forward  difference  derivative),  the  implicit  Euler  method  (based  on  the 
backward  difference  derivative),  and  the  explicit  midpoint  rule  (based  on  the  central 
rectangular  rule).  Shortcomings  of  such  methods  have  been  discussed  briefly  as 
were  remedies  to  overcome  these. 


Problems 

1 .  Planetary-Orbits:  Apply  the  methods  of  numerical  integration  to  the  integral  (4.4) 
and  compare  it  to  the  analytical  result.  Identify  the  three  different  cases  of  elliptic, 
parabolic  or  hyperbolic  orbits  by  varying  the  initial  conditions. 

2.  Lennard -Jones  Scattering:  Consider  the  scattering  of  two  point  particles 
which  interact  via  the  Lennard-Jones  potential  U(r)  =  4cr[(£/r)12  —  ( e/r )6] 
with  a,  s  >  0  (see  Chapter  7).  Calculate  the  orbit  (p(p). 

3.  Harmonic-Motion:  Consider  the  motion  of  a  point  particle  in  the  radial  har¬ 
monic  oscillator  U(p )  =  mco1  p1  / 2.  According  to  Bertrand’s  theorem  (see 
Refs.  [1-3])  the  particle’s  trajectories  should  be  closed  orbits.  Demonstrate  this 
numerically  as  well  as  analytically. 
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Chapter  5 

Ordinary  Differential  Equations:  Initial  Value 
Problems 


5.1  Introduction 

This  chapter  introduces  common  numeric  methods  designed  to  solve  initial  value 
problems .  The  discussion  of  the  Kepler  problem  in  the  previous  chapter  allowed 
the  introduction  of  three  concepts,  namely  the  implicit  Euler  method,  the  explicit 
Euler  method,  and  the  implicit  midpoint  rule.  Furthermore,  we  mentioned  the 
symplectic  Euler  method.  In  this  chapter  we  plan  to  put  these  methods  into  a  more 
general  context  and  to  discuss  more  advanced  techniques. 

Let  us  define  the  problem:  We  consider  initial  value  problems  of  the  form 

(  y(t)  =  f(y,  t) , 

|  (5.1) 

(  j(0)  =  Jo  , 

where  y(t)  =  y  is  an  ^-dimensional  vector  and  yo  is  referred  to  as  the  initial  value 
of  y.  Some  remarks  about  the  form  of  Eq.  (5. 1)  are  required: 

(i)  We  note  that  by  posing  Eq.  (5.1),  we  assume  that  the  differential  equation  is 
explicit  in  y,  i.e.  initial  value  problems  of  the  form 

(  G(y)  =  f(y,  t )  , 

(5.2) 

(  J(0)  =  Jo  , 

are  only  considered  if  G(y)  is  analytically  invertible.  For  instance,  we  will  not 
deal  with  differential  equations  of  the  form 

j  +  log(j)  =  1  .  (5.3) 


©  Springer  International  Publishing  Switzerland  2016 

B.A.  Stickler,  E.  Schachinger,  Basic  Concepts  in  Computational  Physics , 

DOI  10.1007/978-3-319-27265-8  5 


63 


64 


5  Ordinary  Differential  Equations:  Initial  Value  Problems 


(ii)  We  note  that  Eq.  (5.1)  is  a.  first  order  differential  equation  in  y.  However,  this 
is  in  fact  not  a  restriction  since  we  can  transform  every  explicit  differential 
equation  of  order  n  into  a  coupled  set  of  explicit  first  order  differential 
equations.  Let  us  demonstrate  this.  We  regard  an  explicit  differential  equation 
of  the  form 


y(n)  =f(t,y,y,y,...,y{n-iy) ,  (5.4) 

where  we  defined  y^k)  =  This  equation  is  equivalent  to  the  set 

yi=y2, 
h  =  , 


yn— i  —  yn  •> 

yn  =f(t,yuyi,  ■  ■  ■  ,yn) ,  (5.5) 

which  can  be  written  as  Eq.  (5.1).  Hence,  we  can  attenuate  the  criterion 
discussed  in  point  (i),  that  the  differential  equation  has  to  be  explicit  in  y,  to 
the  criterion  that  the  differential  equation  of  order  n  has  to  be  explicit  in  the 
n- th  derivative  of  y,  namely  y^n) . 

There  is  another  point  required  to  be  discussed  before  moving  on.  The  numerical 
treatment  of  initial  value  problems  is  of  eminent  importance  in  physics  because 
many  differential  equations,  which  appear  unspectacular  at  first  glance,  cannot  be 
solved  analytically.  For  instance,  consider  a  first  order  differential  equation  of  the 
type: 


y  =  t2+y2.  (5.6) 

Although  this  equation  appears  to  be  simple,  one  has  to  rely  on  numerical  methods 
to  obtain  a  solution.  However,  Eq.  (5.6)  is  not  well  posed  since  the  solution  is 
ambiguous  as  long  as  no  initial  values  are  given.  A  numerical  solution  is  only 
possible  if  the  problem  is  completely  defined.  In  many  cases,  one  uses  numerical 
methods  although  the  problem  is  solvable  with  the  help  of  analytic  methods,  simply 
because  the  solution  would  be  too  complicated.  A  numerical  approach  might  be 
justified,  however,  one  should  always  remember  that,  quote  [1]: 

Numerical  methods  are  no  excuse  for  poor  analysis. 

This  chapter  will  be  augmented  by  a  chapter  on  the  double  pendulum,  which 
will  serve  as  a  demonstration  of  the  applicability  of  Runge-Kutta  methods  and 
by  a  chapter  on  molecular  dynamics  which  will  demonstrate  the  applicability  of  the 
leap-frog  algorithm. 
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5.2  Simple  Integrators 

We  start  by  reintroducing  the  methods  already  discussed  in  the  previous  chapter. 
Again,  we  discretize  the  time  coordinate  t  via  the  relation  tn  —  to  +  n At  and 
define/^  =  f(tn )  accordingly.  In  the  following  we  will  refrain  from  noting  the  initial 
condition  explicitly  for  a  more  compact  notation.  We  investigate  Eq.  (5.1)  at  some 
particular  time  tn : 


yn=f(yn,tn )•  (5.7) 

Integrating  both  sides  of  (5.7)  over  the  interval  [tn,  t„+ 1]  gives 

yn+\  =y„+[  d t'f\y(t'),t']  ■  (5.8) 

*  tn 

Note  that  Eq.  (5.8)  is  exact  and  it  will  be  our  starting  point  in  the  discussion  of 
several  paths  to  a  numeric  solution  of  initial  value  problems.  These  solutions  will 
be  based  on  an  approximation  of  the  integral  on  the  right  hand  side  of  Eq.  (5.8)  with 
the  help  of  the  methods  already  discussed  in  Chap.  3. 

In  the  following  we  list  four  of  the  best  known  simple  integration  methods  for 
initial  value  problems: 

(1)  Applying  the  forward  rectangular  rule  (3.9)  to  Eq.  (5.8)  yields 


yn+ 1  =  yn  +f(yn,  QAt  +  <ff(At2) ,  (5.9) 

which  is  the  explicit  Euler  method  we  encountered  already  in  Sect.  4.2.  This 
method  is  also  referred  to  as  the  forward  Euler  method.  In  accordance  to 
the  forward  rectangular  rule,  the  leading  term  of  the  error  of  this  method  is 
proportional  to  At1  as  was  pointed  out  in  Sect.  3.2. 

(2)  We  use  the  backward  rectangular  rule  (3.10)  in  Eq.  (5.8)  and  obtain 

yn+\  =yn  +f(yn+\,tn+i)At+  <ff(At2) ,  (5.10) 

which  is  the  implicit  Euler  method,  also  referred  to  as  backward  Euler 
method.  As  already  highlighted  in  Sect.  4.2,  it  may  be  necessary  to  solve 
Eq.  (5.10)  numerically  for  yn+\.  (Some  notes  on  the  numeric  solution  of  non¬ 
linear  equations  can  be  found  in  Appendix  B.) 

(3)  The  central  rectangular  rule  (3.13)  approximates  Eq.  (5.8)  by 

yn+ 1  =  yn  +f(yn+i,tn+i)At  +  ffiAt3) ,  (5.11) 

and  we  rewrite  this  equation  in  the  form: 


yn+i  =  yn-\  +  2f(yn,  tn)At  +  iff  (At3)  . 


(5.12) 
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This  method  is  sometimes  referred  to  as  the  leap-frog  routine  or  STORMER- 
Verlet  method.  We  will  come  back  to  this  point  in  Chap.  7.  Note  that  the 
approximation 


yn  yn+ 1 


2 


5 


(5.13) 


in  Eq.  (5.1 1)  gives  the  implicit  midpoint  rule  as  it  was  introduced  in  Sect.  4.2. 
(4)  Employing  the  trapezoidal  rule  (3.15)  in  an  approximation  to  Eq.  (5.8)  yields 

At  o 

yn+ 1  =y„  +  —  [ f(yn,tn )  +f(yn+i,tn+i )]  +  0(Af) .  (5.14) 

This  is  an  implicit  method  which  has  to  be  solved  for  yn+\ .  It  is  generally  known 
as  the  Crank-Nicolson  method  [2]  or  simply  as  trapezoidal  method. 

Methods  (1),  (2),  and  (4)  are  also  known  as  one-step  methods,  since  only  function 
values  at  times  tn  and  tn+ 1  are  used  to  propagate  in  time.  In  contrast,  the  leap¬ 
frog  method  is  already  a  multi-step  method  since  three  different  times  appear  in 
the  expression.  Basically,  there  are  three  different  strategies  to  improve  these  rather 
simple  methods: 

•  Taylor  series  methods:  Use  more  terms  in  the  Taylor  expansion  of  yn+\  • 

•  Linear  Multi-Step  methods:  Use  data  from  previous  time  steps  y^k  <  n  in  order 
to  cancel  terms  in  the  truncation  error. 

•  Runge-Kutta  method:  Use  intermediate  points  within  one  time  step. 

We  will  briefly  discuss  the  first  two  alternatives  and  then  turn  our  attention  to  the 
Runge-Kutta  methods  in  the  next  section. 


Taylor  Series  Methods 


From  Chap.  2  we  are  already  familiar  with  the  TAYLOR  expansion  (2.7)  of  the 
function  yn+i  around  the  point  yn, 

At2 

yn+ 1  =  yn  +  Atyn  H - - —yn  +  <^(2\f3)  .  (5.15) 

We  insert  Eq.  (5.7)  into  Eq.  (5.15)  and  obtain 


yn+\  =yn  +  Atf{yn,tn )  + 


At2 

~T 


%  +  m.At3) . 


(5.16) 
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So  far  nothing  has  been  gained  since  the  truncation  error  is  still  proportional  to  At2. 
However,  calculating  yn  with  the  help  of  Eq.  (5.7)  gives 

yn  =  -rf(yn,  t„)  =  /(y„,  tn)  +f'(y„,  tn)yn  =f(yn,  tn)  +f'(yn,  tn)f(yn,  tn)  ,  (5.17) 

at 

and  this  results  together  with  Eq.  (5.16)  in: 


Af  r* 


yn-\-\  —  yn  +  Atf(yn,tn)+  —  [f(y„,tn)  +  /' (yn,  tn)\  +  0(At3)  .  (5.18) 


This  manipulation  reduced  the  local  truncation  error  to  orders  of  At3.  The  deriva- 
tives  of  f(yn,  tn ),  f'(yn ,  tn)  and  f(yn,  tn )  can  be  approximated  with  the  help  of  the 
methods  discussed  in  Chap.  2,  if  an  analytic  differentiation  is  not  feasible.  The  above 
procedure  can  be  repeated  up  to  arbitrary  order  in  the  Taylor  expansion  (5.15). 


Linear  Multi-step  Methods 


A  k-th  order  linear  multi-step  method  is  defined  by  the  approximation 


k  k-\- 1 

3Y+1  —  ^  ^  cijyn-j  T  At  ^  ^  bjf  (yn-\- 1  —j ,  tn-\- 1  —j)  ,  (5.19) 

j= o  j= o 


of  Eq.  (5.8).  The  coefficients  cij  and  bj  have  to  be  determined  in  such  a  way  that  the 
local  truncation  error  is  reduced.  Two  of  the  best  known  techniques  are  the  so  called 
second  order  Adams-Bashford  method 


yn-\- 1  —  yn  T 


At 

~1 


and  the  second  order  rule  ( backward  differentiation  formula) 


At 

yn—  1  T  ~Z~f(ynJr\->  6?+l) 


(5.20) 


(5.21) 


(For  details  please  consult  Refs.  [3-5].) 

We  note  in  passing  that  the  backward  differentiation  formula  of  arbitrary  order 
can  easily  be  obtained  with  the  help  of  the  operator  technique  introduced  in  Sect.  2.4, 
Eq.  (2.27).  One  simply  inserts  the  backward  difference  series  (2.27)  to  arbitrary 
order  into  the  right  hand  side  of  the  differential  equation  (5.7). 
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In  many  cases,  multi-step  methods  are  based  on  the  interpolation  of  previously 
computed  values  34  by  LAGRANGE  polynomials.  This  interpolation  is  then  inserted 
into  Eq.  (5.8)  and  integrated.  However,  a  detailed  discussion  of  such  procedures  is 
beyond  the  scope  of  this  book.  The  interested  reader  is  referred  to  Refs.  [6,  7]. 

Nevertheless,  let  us  make  one  last  point.  We  note  that  Eq.  (5.19)  is  explicit  for 
bo  —  0  and  implicit  for  bo  ^  0.  In  many  numerical  realizations  one  combines 
implicit  and  explicit  multi-step  methods  in  such  a  way  that  the  explicit  result  [solve 
Eq.  (5.19)  with  bo  —  0]  is  used  as  a  guess  to  solve  the  implicit  equation  [solve 
Eq.  (5.19)  with  bo  ^  0].  Hence,  the  explicit  method  predicts  the  value  yn+ 1  and  the 
implicit  method  corrects  it.  Such  methods  yield  very  good  results  and  are  commonly 
referred  to  as  predictor-corrector  methods  [8] . 


5.3  Runge-Kutta  Methods 

In  contrast  to  linear  multi-step  methods,  the  idea  in  Runge-Kutta  methods  (see, 
for  instance,  Ref.  [6])  is  to  improve  the  accuracy  by  calculating  intermediate  grid- 
points  within  the  interval  [tn,  4+ 1].  We  note  that  the  approximation  (5.11)  resulting 
from  the  central  rectangular  rule  is  already  such  a  method  since  the  function  value 
34+1/2  at  the  grid-point  4+1/2  =  tn  +  At/ 2  is  taken  into  account.  We  investigate  this 
in  more  detail  and  rewrite  Eq.  (5. 1 1): 

yn+i  =  yn  +f(yn+i,tn+i)At+  @{At3) .  (5.22) 

We  now  have  to  find  appropriate  approximations  to  yn+ 1/2  which  will  increase 
the  accuracy  of  Eq.  (5.1 1).  Our  first  choice  is  to  replace  34+1/2  with  the  help  of  the 
explicit  Euler  method,  Eq.  (5.9), 

At  At 

yn+ 1  =y„  +  yj«  =y„  +  y f(yn,  t„) ,  (5.23) 

which,  inserted  into  Eq.  (5.22)  yields 

At  +  0(At2)  .  (5.24) 

We  note  that  Eq.  (5.24)  is  referred  to  as  the  explicit  midpoint  rule.  In  analogy 
we  could  have  approximated  34+1/2  with  the  help  of  the  averaged  function  value 
fiyn+ 1/2  which  results  in 


34+1  =  yn  +/ 


At  .  .  At 

yn  t  4)  >  4 


yn  T  34+1 


tn  + 


At  +  At 2)  . 


34+1  =  34  +/ 


2 


(5.25) 
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This  equation  is  referred  to  as  the  implicit  midpoint  rule.  Let  us  explain  how  we 
obtain  an  estimate  for  the  error  in  Eqs.  (5.24)  and  (5.25).  In  case  of  Eq.  (5.24)  we 
investigate  the  term 


}Y+1  Jn  f 


At 

yn  H-  ~Z~f  (yn  •>  in)  ?  in  T 


At 

T 


At . 


The  Taylor  expansion  of  yn+\  and /(•)  around  the  point  At  —  0  yields 

At'“ ~ 

At  [ y„  -f(y„,  t„ )]  +  —  [y  -  f(yn ,  t„)  -f(y„,  t„)y„ ]  +  . . .  .  (5.26) 

We  observe  that  the  first  term  cancels  because  of  Eq.  (5.7).  Consequently,  the  error 
is  of  order  At2.  A  similar  argument  holds  for  Eq.  (5.25). 

Let  us  introduce  a  more  convenient  notation  for  the  above  examples  before  we 
concentrate  on  a  more  general  topic.  It  is  presented  in  algorithmic  form,  i.e.  it  defines 
the  sequence  in  which  one  should  calculate  the  various  terms.  This  is  convenient  for 
two  reasons,  first  of  all  it  increases  the  readability  of  complex  methods  such  as 
Eq.  (5.25)  and,  secondly,  it  is  easy  to  identify  which  part  of  the  method  involves 
an  implicit  step  and  which  part  has  to  be  solved  separately  for  the  corresponding 
variable.  For  this  purpose  let  us  introduce  variables  Yt  of  some  index  i  >  1  and 
we  use  a  simple  example  to  illustrate  this  notation.  Consider  the  explicit  Euler 
method  (5.9).  It  can  be  written  as 


Y\  =  yn  , 

yn+ 1  =  yn  +f(Yi,tn)At .  (5.27) 

In  a  similar  fashion  we  write  the  implicit  Euler  method  (5.10)  as 

Y\  =  yn  +/(Ti,  tn+\)At , 

yn+ 1  —  yn  +f(Y\,  tn+\)At .  (5.28) 

It  is  understood  that  the  first  equation  of  (5.28)  has  to  be  solved  for  Y\  first  and  this 
result  is  then  plugged  into  the  second  equation  in  order  to  obtain  yn+\.  One  further 
example:  the  Crank-Nicolson  (5.14)  method  can  be  rewritten  as 

Y\  =  yn  , 

At 

Yl  =  yn  +  —  [f(Xutn)  +/(T2,4+l)]  , 

At 

yn+ 1  =  yn  +  [ f(YUt„ )  +f(Y2,tn+l)\  ,  (5.29) 

where  the  second  equation  is  to  be  solved  for  Y2  in  the  second  step. 
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In  analogy,  the  algorithmic  form  of  the  explicit  midpoint  rule  (5.24)  is  defined  as 


Y\  = 


i  =  yn , 


y2  = 


yn+ 1  — 


At 

yn  +  — / 


Y\,tn  + 


At 


At  (  At 

yn  +  —f  (  *2,  tn  +  — 


(5.30) 


and  we  find  for  the  implicit  midpoint  rule  (5.25): 


At  (  At 

Y\  =  yn  +  — f  \  Y\,tn  — 


yn+i  —  yn  +  Atf  (  Fi,  tn  + 


(5.31) 


The  above  algorithms  are  all  examples  of  the  so  called  Runge-Kutta  methods. 
We  introduce  the  general  representation  of  a  d-stage  Runge-Kutta  method: 


d 

Yi  =  yn  +  At  ^  (Fj,  4  +  c/ZU)  ,  i  =  1, . . . ,  d  , 

7=1 

d 

yn+ 1  =  Ft  +  ^  bjf  (Yj,  tn  +  c/ZU)  .  (5.32) 

7=1 

We  note  that  Eq.  (5.32)  it  is  completely  determined  by  the  coefficients  bj  and 
Cj.  In  particular  a  =  {ay}  is  a  d  x  d  matrix,  while  /?  =  and  c  —  {cj}  are  d 
dimensional  vectors. 

Butcher  tableaus  are  a  very  useful  tool  to  characterize  such  methods.  They 
provide  a  structured  representation  of  the  coefficient  matrix  a  and  the  coefficient 
vectors  b  and  c: 


Cl 

a\\  an  . . .  a\d 

C2 

a2\  Cl22  •  •  •  Cl2d 

•  •  •  • 

Cd 

•  •  •  • 

adi  cid2  •  •  •  add 

b\  b2  ...  bd 

(5.33) 
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We  note  that  the  Runge-Kutta  method  (5.32)  or  (5.33)  is  explicit  if  the  matrix  a  is 
zero  on  and  above  the  diagonal,  i.e.  atj  —  0  for  j  >  i.  Let  us  rewrite  all  the  methods 
described  here  in  the  form  of  Butcher  tableaus: 


Explicit  Euler: 


(5.34) 


Implicit  Euler: 


(5.35) 


Crank-Nicolson: 


0 

0  0 

1 

1  1 

2  2 

1  1 

2  2 

(5.36) 


Explicit  Midpoint: 


0 

0  0 

1 

2 

7° 

1  1 

2  2 

(5.37) 


Implicit  Midpoint: 
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With  the  help  of  Runge-Kutta  methods  of  the  general  form  (5.32)  one  can 
develop  methods  of  arbitrary  accuracy.  One  of  the  most  popular  methods  is  the 
explicit  four  stage  method  (we  will  call  it  e-RK-4 )  which  is  defined  by  the  algorithm: 


Y\  =  yn, 
Y2  = 


At 

yn  +  —f(Yi,tn)  , 


At  f  At 

Y2  —  yn  +  —f  (  Y2,  tn  +  — 


Y4  —  yn  +  Atf  (  y2,  tn  + 


At 


yn-\- 1  —  yn  -Y 


At 

~6 


f(Y\  ,  tn)  +  2/  (  Y2,  tn  + 


At 


+  2/  [  T3,  tn  + 


At 


T  f  (Y4,  tn) 


(5.39) 


This  method  is  an  analogue  to  the  Simpson  rule  of  numerical  integration  as 
discussed  in  Sect.  3.4.  However,  a  detailed  compilation  of  the  coefficient  array  a 
and  coefficient  vectors  b ,  and  c  is  quite  complicated.  A  closer  inspection  reveals 
that  the  methodological  error  of  this  method  behaves  as  At5 .  The  algorithm  e-RK-4 , 
Eq.  (5.39),  is  represented  by  a  Butcher  tableau  of  the  form 


0 

0  0  0  0 

1 

2 

i  0  0  0 

1 

2 

0  \  0  0 

1 

0  0  10 

1111 

6  3  3  6 

(5.40) 


Another  quite  popular  method  is  given  by  the  Butcher  tableau 


1  \/3 

1  1  V3 

2  6 

4  4  6 

1  ,  73 

2  '  6 

1  1  73  1 

4  '  6  4 

1  1 

2  2 

(5.41) 


We  note  that  this  method  is  implicit  and  mention  that  it  corresponds  to  the  two  point 
Gauss-Legendre  quadrature  of  Sect.  3.6. 

A  further  improvement  of  implicit  Runge-Kutta  methods  can  be  achieved  by 
choosing  the  E  in  such  a  way  that  they  correspond  to  solutions  of  the  differential 
equation  (5.7)  at  intermediate  time  steps.  The  intermediate  time  steps  at  which 
one  wants  to  reproduce  the  function  are  referred  to  as  collocation  points.  At  these 
points  the  functions  are  approximated  by  interpolation  on  the  basis  of  LAGRANGE 
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polynomials,  which  can  easily  be  integrated  analytically.  However,  the  discussion 
of  such  collocation  methods  [8]  is  far  beyond  the  scope  of  this  book. 

In  general  Runge-Kutta  methods  are  very  useful.  However  one  always  has  to 
keep  in  mind  that  there  could  be  better  methods  for  the  problem  at  hand.  Let  us  close 
this  section  with  a  quote  from  the  book  by  PRESS  et  al.  [9]: 

For  many  scientific  users,  fourth-order  Runge-Kutta  is  not  just  the  first  word  on  ODE 
integrators,  but  the  last  word  as  well.  In  fact,  you  can  get  pretty  far  on  this  old  workhorse, 
especially  if  you  combine  it  with  an  adaptive  step-size  algorithm.  Keep  in  mind,  however, 
that  the  old  workhorse’s  last  trip  may  well  take  you  to  the  poorhouse:  Bulirsch-Stoer  or 
predictor-corrector  methods  can  be  very  much  more  efficient  for  problems  where  high 
accuracy  is  a  requirement.  Those  methods  are  the  high-strung  racehorses.  Runge-Kutta  is 
for  ploughing  the  fields. 


5.4  Hamiltonian  Systems:  Symplectic  Integrators 

Let  us  define  a  symplectic  integrator  as  a  numerical  integration  in  which  the 
mapping 


(5.42) 


&At  •  yn  E>  yn+ 1  > 


is  symplectic.  Here  (P^t  is  referred  to  as  the  numerical  flow  of  the  method.  If  we 
regard  the  initial  value  problem  (5.1)  we  can  define  in  an  analogous  way  the  flow  of 
the  system  cpt  as 


<pt(yo)  =  y(t )  • 


(5.43) 


For  instance,  if  we  consider  the  initial  value  problem 


y=Ay, 
y(0)  =  jo , 


(5.44) 


where  y  e  R"  and  A  e  MwXw,  then  the  flow  cpt  of  the  system  is  given  by: 


<Pt(yo)  =  exp(Af)yo  • 


(5.45) 


On  the  other  hand,  if  we  regard  two  vectors  v,w  e  M2,  we  can  express  the  area 
co  of  the  parallelogram  spanned  by  these  vectors  as 


(5.46) 
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where  we  put  v  =  (a,  b)T  and  w  =  (c,d)T .  More  generally,  if  v,  w  e  M?d,  we  have 


co(v,  w)  —  v 


0  / 
-7  0 


w  =  vJw  , 


(5.47) 


where  I  is  the  d  x  d  dimensional  unity  matrix.  Hence  (5.47)  represents  the  sum  of 
the  projected  areas  of  the  form 


det  (  Wi  )  . 

V  vi+d  Wi+d  ) 

If  we  regard  a  mapping  M  :  R2d  \->  M2J  and  require  that 

co(Mv,Mw)  =  oj(v,  w)  , 
i.e.  the  area  is  preserved,  we  obtain  the  condition  that 

MtJM  =  /  , 

which  is  equivalent  to  det(M)  =  1 .  Finally,  a  differentiable  mapping/  :  R2d  \->  M2J 
is  referred  to  as  symplectic  if  the  linear  mapping  f(x)  (JACOBI  matrix)  conserves 
co  for  all  v  e  M?d .  One  can  easily  prove  that  the  flow  of  Hamiltonian  systems 
is  symplectic,  i.e.  area  preserving  in  phase  space.  Every  Hamiltonian  system  is 
characterized  by  its  Hamilton  function  H (p,  q )  and  the  corresponding  Hamilton 
equations  of  motion  [10-14]: 


(5.48) 

(5.49) 

(5.50) 


P  =  -VqH(p,q)  and  q  =  WpH(p,q)  .  (5.51) 

We  define  the  flow  of  the  system  via 

<Pt(x o)  =  x(t)  ,  (5.52) 

where 

xo  =  f^°  j  and  x(t)  —  \  .  (5.53) 

V^o  / 

Hence  we  rewrite  (5.51)  as 

x  =  J~lWxH(x)  ,  (5.54) 

and  note  that  x  =  x(t,  xo)  is  a  function  of  time  and  initial  conditions.  In  a  next  step 
we  define  the  Jacobian  of  the  flow  via 


Pt(x o)  =  VXo^(x0)  , 


(5.55) 
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and  calculate 


Pt(x o)  =  VX0x 

=  J~l  VX0VxH(x) 

=  J~l  AxH(x)VX0x 
=  J~l  AxH(x)Pt(x o) 


( -VqpH(p,  q)  - VqqH (p,  q)  \  p  ,  ( 

l  VppH(p,q)  VpqH(p,q)  )  A 


(5.56) 


Hence,  Pt  is  given  by  the  solution  of  the  equation 

Pt  -  AxH(x)P,  .  (5.57) 

Symplecticity  ensures  that  the  area 

PTtJPt  —  const ,  (5.58) 

which  can  be  verified  by  calculating  ^  (PfjPt)  where  we  keep  in  mind  that  JT  — 
—J.  Hence, 


d  T  •  71  T  • 

—PJJP,  =  PJP,  +  PJP, 

At  1  ’  ’ 

=  PTt  AxH(x)(rl)T JP,  +  PTtJrlAxH{x)P, 

=  0  ,  (5.59) 

even  if  the  Hamilton  function  is  not  conserved.  This  means  that  the  flow  of  a 
Hamiltonian  system  is  symplectic,  i.e.  area  preserving  in  phase  space  [10,  11,  13]. 

Since  this  conservation  law  is  violated  by  methods  like  e-RK-4  or  explicit  Euler, 
one  introduces  so  called  symplectic  integrators ,  which  have  been  particularly 
designed  as  a  remedy  to  this  shortcoming.  A  detailed  investigation  of  these 
techniques  is  far  too  engaged  for  this  book.  The  interested  reader  is  referred  to  Refs. 
[12,  15-17]. 

However,  we  provide  a  list  of  the  most  important  integrators. 


Symplectic  Euler 


C[n-\-\  —  4n  T  &(c[ni  Pn-\-\)  At  , 
Pn+ 1  =  Pn  +  b(qn,  pn+\)At  . 


(5.60a) 

(5.60b) 
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Here  a(p,q )  =  ^ 7pH(p,q )  and  b(p,q )  =  —  VqH(p,q)  have  already  been  defined 
in  Sect.  4.2. 


Symplectic  Runge-Kutta 

It  can  be  demonstrated  that  a  Runge-Kutta  method  is  symplectic  if  the  coeffi¬ 
cients  fulfill 


bidij  +  bjdji  =  bibj  ,  (5.61) 

for  all  ij  [16,  18].  This  is  a  property  of  the  collocation  methods  based  on  Gauss 
points  Ci. 


5.5  An  Example:  The  Kepler  Problem,  Revisited 

It  has  already  been  discussed  in  Sect.  4.2  that  the  Hamilton  function  of  this  system 
takes  on  the  form  [19] 


H{p,q)  =  \  ( p\  +  pl) - ,  ,  (5.62) 

2 

and  Hamilton’s  equations  of  motion  read 


Pi  =  -VqiH(p,q)  =  - 

q\ 

(5.63a) 

(q\  +  q\)  5 

Pi  =  ~yqiH{p,  q)  =  - 

qi 

(5.63b) 

(q\  +  q2^  ’ 

k\  =  VPIH(p,q)  =pi 

(5.63c) 

qi  =  VP2H(p,q )  =p2 

• 

(5.63d) 

We  now  introduce  the  time  instances  tn  —  t o  +  nAt  and  define  q1 '■  =  qi(tn)  and 
pnt  =  pi(tn )  for  i  —  1, 2.  In  the  following  we  give  the  discretized  recursion  relation 
for  three  different  methods,  namely  explicit  Euler,  implicit  Euler,  and  symplectic 
Euler. 
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Explicit  Euler 


In  case  of  the  explicit  Euler  method  we  have  simple  recursion  relations 


p1+1  =  pi 


Pn2+1  =  Pn2 


q'\At 


[{q'D2  +  (q'2)2]1 


3  ’ 


q^At 


[0 3? )2  +  (<?"2)2E 


3  ’ 


fi+l  =  41  +  fiAt , 
qn2+l  =  qn2- +  &At  . 


(5.64a) 

(5.64b) 

(5.64c) 

(5.64d) 


Implicit  Euler 

We  obtain  the  implicit  equations 


(5.65a) 

(5.65b) 

(5.65c) 

(5.65d) 


These  implicit  equations  can  be  solved,  for  instance,  by  the  use  of  the  Newton 
method  discussed  in  Appendix  B. 


Symplectic  Euler 

Employing  Eqs.  (5.60)  gives 


q\At 

[(tf)2  +  foS)2]S  ’ 


(5.66a) 


Pl+'  =pn2~ 


M)2  +  (<&2]1  ’ 


(5.66b) 
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qn+x  =  q’l  +  pn+lAt ,  (5.66c) 

qn2+1  =  qn2  +  p2+1At .  (5.66d) 


These  implicit  equations  can  be  solved  analytically  and  we  obtain 


n+l  _  n  <f\At 

1  1  [(tf)2  +  (9S)2]  U 

(5.67a) 

P2+'  =P2 - — - T  . 

■ 

(5.67b) 

,,  a1!  At2 

q\+l  =  q\  +  p\At 

[(q’l)2  +  (<?"2)2]5 

(5.67c) 

qn+x  =  qn2  +  pn2At  ^  3. 

‘  ‘  [A,')2  +  (qn2)2]i 

(5.67d) 

A  second  possibility  of  the  symplectic  Euler  is  given  by  Eq.  (4.33).  It  reads 


(5.68a) 

(5.68b) 

(5.68c) 

(5.68d) 


The  trajectories  calculated  using  these  four  methods  are  presented  in  Figs.  5.1 
and  5.2,  the  time  evolution  of  the  total  energy  of  the  system  is  plotted  in  Fig.  5.3. 
The  initial  conditions  were  [16] 


Pi(0)  =  0,  q  i(0)  =  1— e,  (5.69) 


and 


P2(0 )  =  <72 (0)  =  0  ,  (5.70) 

with  e  —  0.6  which  gives  H  —  —1/2.  Furthermore,  we  set  At  =  0.01  for  the 
symplectic  Euler  methods  and  At  —  0.005  for  the  forward  and  backward  Euler 
methods  in  order  to  reduce  the  methodological  error.  The  implicit  equations  were 
solved  with  help  of  the  Newton  method  as  discussed  in  Appendix  B.  The  Jacobi 
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Fig.  5.1  Kepler  trajectories  in  position  space  for  the  initial  values  defined  in  Eqs.  (5.69) 
and  (5.70).  They  are  indicated  by  a  solid  square.  Solutions  have  been  generated  (a)  by  the  explicit 
Euler  method  (5.64),  (b)  by  the  implicit  Euler  method  (5.65),  (c)  by  the  symplectic  Euler 
method  (5.67),  and  (d)  by  the  symplectic  Euler  method  (5.68) 


matrix  was  calculated  analytically,  hence  no  methodological  error  enters  because 
approximations  of  derivatives  were  unnecessary. 

According  to  theory  [19]  the  g-space  and  p- space  projections  of  the  phase  space 
trajectory  are  ellipses.  Furthermore,  energy  and  angular  momentum  are  conserved. 
Thus,  the  numerical  solutions  of  Hamilton’s  equations  of  motion  (5.63)  should 
reflect  these  properties.  Figures  5.1a,  b  and  5.2a,  b  present  the  results  of  the 
explicit  Euler  method,  Eqs.  (5.64),  and  the  implicit  Euler  method,  Eqs.  (5.65), 
respectively.  Obviously,  the  result  does  not  agree  with  the  theoretical  expectation 
and  the  trajectories  are  open  instead  of  closed.  The  reason  for  this  behavior  is 
the  methodological  error  of  the  method  which  is  accumulative  and,  thus,  causes 
a  violation  of  energy  conservation.  This  violation  becomes  apparent  in  Fig.  5.3 
where  the  total  energy  H(t)  is  plotted  vs  time  t.  Neither  the  explicit  Euler  method 
(dashed  line)  nor  the  implicit  Euler  method  (short  dashed  line)  conform  to  the 
requirement  of  energy  conservation.  We  also  see  step-like  structures  of  H(t).  At  the 
center  of  these  steps  an  open  diamond  symbol  and  in  the  case  of  the  implicit  Euler 
method  an  additional  open  circle  indicate  the  position  in  time  of  the  perihelion 
of  the  point-mass  (point  of  closest  approach  to  the  center  of  attraction).  It  is 
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Pi  Pi 


Pi  Pi 

Fig.  5.2  Kepler  trajectories  in  momentum  space  for  the  initial  values  defined  in  Eqs.  (5.69) 
and  (5.70).  They  are  indicated  by  a  solid  square.  Solutions  have  been  generated  (a)  by  the  explicit 
Euler  method  (5.64),  (b)  by  the  implicit  Euler  method  (5.65),  (c)  by  the  symplectic  Euler 
method  (5.67),  and  (d)  by  the  symplectic  Euler  method  (5.68) 


Fig.  5.3  Time  evolution  of  the  total  energy  H  calculated  with  the  help  of  the  four  methods 
discussed  in  the  text.  The  initial  values  are  given  by  Eqs.  (5.69)  and  (5.70).  Solutions  have  been 
generated  (i)  by  the  explicit  Euler  method  (5.64)  (< dashed  line),  (ii)  by  the  implicit  Euler 
method  (5.65)  (< dotted  line),  (iii)  by  the  symplectic  Euler  method  (5.67)  {solid  line),  and  (iv) 
by  the  symplectic  Euler  method  (5.68)  {dashed- dotted  line) 
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indicated  by  the  same  symbols  in  Fig.  5.1a,  b.  At  this  point  the  point-mass  reaches 
its  maximum  velocity,  the  pericenter  velocity,  and  it  covers  the  biggest  distances 
along  its  trajectory  per  time  interval  At.  Consequently,  the  methodological  error  is 
biggest  in  this  part  of  the  trajectory  which  manifests  itself  in  those  steps  in  H(t).  As 
the  point-mass  moves  ‘faster’  when  the  implicit  Euler  method  is  applied,  again, 
the  distances  covered  per  time  interval  are  greater  than  those  covered  by  the  point- 
mass  in  the  explicit  Euler  method.  Thus,  it  is  not  surprising  that  the  error  of  the 
implicit  Euler  method  is  bigger  as  well  when  H(t)  is  determined. 

These  results  are  in  strong  contrast  to  the  numerical  solutions  of  Eqs.  (5.63) 
obtained  with  the  help  of  symplectic  Euler  methods  which  are  presented  in 
Figs.  5.1c,  d  and  5.2c,  d.  The  trajectories  are  almost  perfect  ellipses  for  both 
symplectic  methods  which  follow  Eqs.  (5.67)  and  (5.68).  Moreover,  the  total  energy 
H{t)  (solid  and  dashed-dotted  lines  in  Fig.  5.3)  varies  very  little  as  a  function  of  t. 
Deviations  from  the  mean  value  can  only  be  observed  around  the  perihelion  which 
is  indicated  by  a  solid  square.  Moreover,  these  deviations  compensate  because  of  the 
symplectic  nature  of  the  method.  This  demonstrates  that  symplectic  integrators  are 
the  appropriate  technique  to  solve  the  equations  of  motion  of  Hamiltonian  systems. 


Summary 

We  concentrated  on  numerical  methods  to  solve  the  initial  value  problem  of  ordinary 
differential  equations.  The  methods  discussed  here  rely  heavily  on  the  various 
methods  developed  for  numerical  integration  because  we  can  always  find  an  integral 
representation  of  this  kind  of  equations.  The  simple  integrators  known  from  Chap.  4 
were  augmented  by  the  more  general  Crank-Nicholson  method  which  was 
based  on  the  trapezoidal  rule  introduced  in  Sect.  3.3.  The  simple  single-step  methods 
were  improved  in  their  methodological  error  by  Taylor  series  methods,  linear 
multi-step  methods,  and  by  the  Runge-Kutta  method.  The  latter  took  intermediate 
points  within  the  time  interval  [tn,tn+ 1]  into  account.  In  principle,  it  is  possible 
to  achieve  almost  arbitrary  accuracy  with  such  a  method.  Nevertheless,  all  those 
methods  had  the  disadvantage  that  because  of  their  methodological  error  energy 
conservation  was  violated  when  applied  to  Hamiltonian  systems.  As  this  problem 
can  be  remedied  by  symplectic  integrators  a  short  introduction  into  this  topic  was 
provided  and  the  most  important  symplectic  integrators  were  presented.  The  final 
discussion  of  Kepler’s  two-body  problem  elucidated  the  various  points  discussed 
throughout  this  chapter. 
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5  Ordinary  Differential  Equations:  Initial  Value  Problems 


Problems 

1.  Write  a  program  to  solve  numerically  the  Kepler  problem.  The  Hamilton 
function  of  the  problem  is  defined  as 


and  the  initial  conditions  are  given  by 


Pi(0 )  =  o,  <7i(0)  =  1  -  e,  p2( 0)  = 


<72  (0)  =  0, 


where  e  —  0.6.  Derive  Hamilton’s  equations  of  motion  and  implement  an 
algorithm  which  solves  these  equations  based  on  the  following  methods 

(a)  Explicit  Euler, 

(b)  Symplectic  Euler. 

2.  Plot  the  trajectories  and  the  total  energy  as  a  function  of  time.  You  can  use  the 
results  presented  in  Figs.  5.1  and  5.2  to  check  your  code.  Modify  the  initial 
conditions  and  discuss  the  results!  Try  to  confirm  Kepler’s  laws  of  planetary 
motion  with  the  help  of  your  algorithm. 

3.  Use  a  symplectic  integrator  to  study  Lennard-Jones  scattering;  see  Problems 
of  Chap.  4 

4.  Solve  the  differential  equation  (5.6)  numerically  with  different  methods.  Use  also 
the  Taylor  series  method  (5.18). 
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Chapter  6 

The  Double  Pendulum 


6.1  Hamilton’s  Equations 

We  investigate  the  dynamics  of  the  double  pendulum  in  two  spacial  dimensions 
as  illustrated  schematically  in  Fig.  6.1.  It  is  the  aim  of  this  section  to  derive 
Hamilton’s  equations  of  motion  for  this  system.  In  a  first  step  we  introduce 
generalized  coordinates  and  determine  the  Lagrange  function  of  the  system  from 
its  kinetic  and  potential  energy  [1-5].  We  then  introduce  generalized  momenta 
and,  finally,  derive  the  Hamilton  function  from  which  Hamilton’s  equations  of 
motion  follow.  They  will  serve  as  a  starting  point  for  the  formulation  of  a  numerical 
method. 

From  Fig.  6.1  we  find  the  coordinates  of  the  two  point  masses  m : 

x\  —  i  sin^i)  ,  z\  =  21  —  l  cos(^i)  ,  (6.1) 


and 


X2  =  l  [sin(^i)  +  sin(<p2)]  ,  zi  =  21  —  l  [cos(^i)  +  cos(<p2)]  •  (6.2) 

Here,  21  is  the  pendulum’s  total  length.  The  angles  cpi ,  i  =  1,2  are  defined  in 
Fig.  6.1. 

We  note  that  l  —  const  and  obtain  the  time  derivatives  of  the  coordinates  (6.1) 
and  (6.2): 


k\  —  l(j)\  cos(^i)  ,  (6.3) 

zi  =  l(p  1  sin(^i)  ,  (6.4) 

x2  =  £  [<P\  cos(^i)  +  (p 2  cos(^2)]  ,  (6.5) 

z2  =  l  [(Pi  sin(^i)  +  (p2  sin(^2)]  .  (6.6) 
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6  The  Double  Pendulum 


Fig.  6.1  Schematic 
illustration  of  the  double 
pendulum,  m  are  the 
point-masses,  21  is  the  total 
length  of  the  pendulum  and 
(pi,  <p2  are  the  corresponding 
angles 


The  Lagrange  function  of  the  system  is  defined  by 


L  —  T  —  U  , 


(6.7) 


with  the  kinetic  energy  T  and  the  potential  U.  The  kinetic  energy  T  is  given  by1 


m 


T  —  —  (x2  +  Zi  +  *2  +  ^2) 


ml2 


\2(p\  +  <pl  +  2<p\ cp2  cos(^i  -  (p2)\ 


(6.8) 


The  potential  energy  U  is  determined  by  the  gravitational  force 


U  =  mgz\  +  mgzi 
=  mgl  [4  -  2cos(^i)  -  cos(<^2)]  , 


(6.9) 


where  g  is  the  acceleration  due  to  gravity.  Hence,  we  get  for  the  Lagrange 
function  L : 

ml2  r  ?  ?  .. 

L  =  —  [2 (pl  +  (p2  +  2^i^2  cos(^i  -  (p2) \  -  mgl  [4  -  2cos(^i)  -  cos(<p2)]  • 

(6.10) 


!We  make  use  of  the  relation: 


sin(x)  sin  (y)  +  cos(x)  cos(v)  =  cos(x  —  y)  . 
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We  find  a  description  of  the  motion  in  phase  space  by  calculating  the  generalized 
momenta pt,  i  —  1 , 2  as 


d  2 

Pi  =  =  ml  [2<pi  +  (p2  cos(^?i  -  (p2)\  , 

o<p\ 


(6.11) 


and 


d  9 

P 2  =  =  ml  [(p2  +  <p\  cos(<pi  -  <p2)\ 

d(p2 


(6.12) 


The  aim  is  now  to  express  the  kinetic  energy  (6.8)  in  terms  of  generalized 
momenta  p\  and  p2 .  To  accomplish  this  we  solve  in  a  first  step  Eq.  (6.12)  for  (p2 
and  obtain 


<p2  =  COS(^l  -  <p2)  ■ 


(6.13) 


This  is  used  to  rewrite  Eq.  (6.1 1).  Solving  for  (p\  gives: 


<Pi  =  [2  —  cos2(^i  -  <p2j\ 


-1 


Pi  P2 


-ml2  ml2 


cos(^i  -  (p2) 


(6.14) 


The  trigonometric  identity  cos2(x)  +  sin2(v)  =  1  changes  Eq.  (6.14)  into 


(pi  = 


1  Pi  =i?2COS(<pi  -  (p2) 


ml2  1  +  sin2(^i  —  (p2) 


(6.15) 


This  is  then  used  to  transform  Eq.  (6.13)  into 


(fl  ~ 


ml2 


P2 


Pl  COS(yi  -  (p2)  -  P2  COS2 ((pi  -  (pi) 

1  +  sin2(<pi  -  (p2) 


1  2p2-p\COS{(pi-(p2) 


ml 2  1  +  sirr(<y9|  —  <p2) 


(6.16) 


Hence,  with  help  of  Eqs.  (6.15)  and  (6.16)  we  can  reevaluate  the  kinetic 
energy  (6.8)  to  give 


ml 


T  =  —  [2 (f>\  +  <pl  +  2(f>\(p2  cos((pi  -  <p2)\ 
1  p\  +  2 p\  -  2p\p2  cosftp,  -  <p2) 


2ml1 


1  +  sin2(^i  -  <p2) 


(6.17) 
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The  Hamilton  function  H(p\,p2,  cp\ ,  (pi)  is  the  sum  of  the  kinetic  energy  (6. 17) 
and  the  potential  energy  (6.9)  and  we  get: 

H  =  T+U 

_  1  p\  +  2 p\  -  lp\p2  cosQ^i  -  cp2) 

2ml2  1  +  sin2((^i  —  (p2) 

-\-mgl  [4  —  2cos(^i)  —  cos(^2)]  •  (6.18) 

Thus,  we  are  now,  finally,  in  a  position  to  formulate  Hamilton’s  equations  of 
motion  from 


(6.19) 


and  the  dynamics  of  the  double  pendulum  are  determined  by  the  solutions  of  the 
following  set  of  differential  equations: 


.  _  1  Pi  -p2cos(cpi  -cp2) 

ml2  1  +  sin2(^i  —  (p2) 

.  _  1  2p2  Pi  cos((^i  -  (p2) 

ml2  1  +  sin2(^i  —  cp2) 


(6.20a) 

(6.20b) 


Pi  = 


ml2  1  +  sin2(^i  —  cp2) 


-P1P2  sin(^i  -  (p2) 


+ 


p\  +  2 p\  -  lp\P2  cos(^i  -  cp2) 
1  +  sin2  (cpi  -  (p2) 


cos(^i  -  cp2 )  sin(^i  -  cp2) 


—2mgl  sin(^i)  , 


(6.20c) 


and 


P  2  = 


pip2  sin(^i  -  (p2) 


ml2  1  +  sin2(^i  —  cp2)  LJ 


p\  +  2 p\  -  lp\p2  cos(^i  -  cp2) 
1  +  sin2  (cpi  -  cp2) 


sin(^i  -  (p2)  cos(^i  -  cp2) 


mgl  sin (cp2)  . 


(6.20d) 
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The  following  section  is  dedicated  to  the  numerical  solution  of  Eqs.  (6.20)  with 
the  help  of  the  explicit  Runge-Kutta  algorithm  e-RK-4  introduced  in  Sect.  5.3. 


6.2  Numerical  Solution 

In  a  first  step  we  recognize  that  Eqs.  (6.20)  are  of  the  form 


y  =  F  (y). 


where  y  e  M4.  Let  us  define 


and  consequently 


(yl) 

(  H>\  \ 

yi 

_ 

(p2 

y3 

Pi 

Xy*) 

Kpi) 

//i0)\ 

fiiy) 

hiy) 

\My)J 


(6.21) 


(6.22) 


(6.23) 


We  introduce  time  instances  t„  =  nAt,  n  e  N  and  use  the  notation  y„  =  y(tn )  = 
,  v".  y'yy'l)1-  Furthermore,  F(y)  is  not  an  explicit  function  of  time  t  and  we 
reformulate  the  e-RK-4  algorithm  of  Eq.  (5.39)  as: 


Y\  =  v„  , 

Y2=yn  +  yF(Fi)  , 

At 

Y3=y„  +  —f{y2)  , 

Y4  —  yn  +  AtF(Y3)  , 

At 

Vn+  i  =  v„  +  —  [F(YA  +  2 F(Y2)  +  2 F(Y3)  +  F(Y4 )]  . 

6 


(6.24) 
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Pi 

Fig.  6.2  Numerical  solution  of  the  double  pendulum  with  initial  conditions  (p \  (0)  =  (p2( 0)  =  0.0, 
p\  (0)  =  4.0  and /?2(0)  =  2.0.  (a)  Trajectory  in  <^-space,  (b)  trajectory  in p-space,  and  (c)  trajectory 
in  local  ( x ,  z)-space.  The  solid  circles  numbered  1  and  2  represent  the  two  masses  in  their  initial 
configuration 


Hence,  the  only  remaining  challenge  is  to  correctly  implement  the  function  F(y)  = 
[f\(y)Ji(y)Ji(y)jA(y)]T  according  to  Eqs.  (6.20). 

The  following  graphs  discuss  the  dynamics  (trajectories  in  ip-  and  p-space,  as 
well  as  in  configuration  space)  of  the  pendulum  and  for  this  purpose  we  defined  the 
parameters  m  —  i  —  1  and  g  =  9.8067.  The  time  step  was  chosen  to  be  At  —  0.001 
and  we  calculated  N  —  60,000  time  steps. 

We  start  with  Fig.  6.2.  The  two  masses  numbered  1  and  2  are  initially  in  the 
equilibrium  position  (solid  circles).  Both  masses  are  pushed  to  the  right  but  the  push 
on  mass  1  [p\ (0)  =  4.0]  is  much  stronger  than  the  one  mass  2  experiences  [p2(0)  = 
2.0].  Thus,  mass  2  is  ‘dragged’  along  in  the  process.  This  is  made  transparent  by  two 
‘snapshots’  indicated  by  solid  light  gray  circles  and  solid  gray  circles.  The  motion 
of  the  whole  system  is  quite  regular. 
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Fig.  6.3  Numerical  solution  of  the  double  pendulum  with  initial  conditions  cp\ (0)  =  1 .0,  <^(0)  = 
0.0,  pi(0)  =  0.0  and  /?2(0)  =  3.0.  (a)  Trajectory  in  <p-space,  (b)  trajectory  in  p-space,  and  (c) 
trajectory  in  local  (x,  z)-space.  The  solid  circles  numbered  1  and  2  represent  the  two  masses  in 
their  initial  configuration 


We  proceed  with  Fig.  6.3.  In  this  case  mass  1  is  displaced  from  its  position  by 
the  initial  angular  displacement  cp\  =  1.0.  This  initial  configuration  is  indicated 
by  the  solid  circles  numbered  1  and  2  representing  the  two  point-masses.  Mass 
2  is  then  pushed  to  the  right  with  p2(0)  =  3.0.  Again,  mass  1  remains  on  a 
trajectory  centered  around  the  point  (0,2)  in  configuration  space.  But  in  contrast 
to  the  previous  situation  it  follows  now  mass  2.  Mass  2,  on  the  other  hand,  develops 
a  very  lively  trajectory,  Fig.  6.3c.  Two  snapshots  indicated  by  solid  light  gray  circles 
and  solid  gray  circles  illustrate  configurations  of  particular  interest. 

The  dynamics  depicted  in  Fig.  6.4  is  quite  similar  to  the  one  already  discussed 
in  Fig.  6.2.  Initially  both  masses  are  in  the  equilibrium  position  and  then  mass  2  is 
pushed  to  the  right  [p2 (0)  =  4.0].  Thus,  mass  1  is  trailing  behind.  In  contrast  to  the 
previous  Fig.  6.3  the  trajectory  of  mass  2  will  now  be  symmetric  around  the  z-axis 
given  enough  time.  Again,  snapshots  indicated  by  solid  light  gray  circles  and  solid 
gray  circles  indicate  interesting  configurations. 
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Fig.  6.4  Numerical  solution  of  the  double  pendulum  with  initial  conditions  (p\ (0)  =  <p2(0)  =  0.0, 
P\  (0)  =  0.0  and /?2(0)  =  4.0.  (a)  Trajectory  in  cp- space,  (b)  trajectory  in p- space,  and  (c)  trajectory 
in  local  (x,  z)-space.  The  solid  circles  numbered  1  and  2  represent  the  two  masses  in  their  initial 
configuration 


The  initial  condition  which  resulted  in  the  trajectory  shown  in  Fig.  6.5  differs 
only  for  mass  2  from  the  initial  conditions  which  lead  to  the  trajectory  in  Fig.  6.4. 
Mass  2  is  now  pushed  even  more  strongly  to  the  right  [piify  —  5.0].  Of  course,  mass 
1  is  again  dragging  behind  mass  2.  In  contrast  to  Fig.  6.4  the  initial  momentum  of 
mass  2  is  now  sufficient  to  allow  mass  2  to  pass  through  the  center  of  the  inner 
mass’  circular  trajectory.  Snapshots  indicated  by  light  gray  solid  circles  and  solid 
gray  circles  emphasize  interesting  configurations. 

The  situation  shown  in  Fig.  6.6  differs  from  the  one  of  Fig.  6.5  only  by  the  initial 
condition  for  mass  2.  It  is  now  pushed  even  more  strongly  to  the  right  [piify  =  6.5] 
and  this  initial  momentum  is  sufficient  to  cause  mass  1  to  rotate  around  the  point 
(0,2).  Nevertheless,  mass  1  is  permanently  dragging  behind  mass  2.  Two  interesting 
configurations  are  depicted  by  snapshots  (solid  light  gray  circles  and  solid  gray 
circles). 
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Fig.  6.5  Numerical  solution  of  the  double  pendulum  with  initial  conditions  cp\  (0)  =  <^(0)  =  0.0, 
Pi  (0)  =  0.0  and  /72(0)  =  5.0.  (a)  Trajectory  in  <^-space,  (b)  trajectory  in  /7-space,  and  (c)  trajectory 
in  local  (x,  z)-space.  The  solid  circles  numbered  1  and  2  represent  the  two  masses  in  their  initial 
configuration  (The  angles  q>2  >  tc  correspond  to  complete  rotations  of  the  pendulum) 


A  comparison  between  trajectories  as  a  result  of  different  initial  conditions 
reveals  that  the  physical  system  is  highly  sensitive  to  the  choice  of  the  initial 
conditions  yo  —  [<^i(0),  <^2(0),/?i(0),/?2(0)]r.  For  instance,  consider  Figs.  6.4,  6.5, 
and  6.6.  In  all  three  cases  we  chose  yo  in  such  a  way  that  the  initial  angles 
<pi(0)  =  ^2(0)  =  0  and  the  generalized  momentum  coordinate  p\(0)  =  0.  The 
only  difference  is  that  we  used  different  values  for  the  initial  value  of  the  second 
momentum  coordinate  P2.  However,  the  resulting  dynamics  of  (p\  vs.  <p2  as  well  as 
pi  vs.  P2  are  entirely  different  and  so  are  the  local  (v,  z)-space  trajectories.  Hence, 
the  system  is  chaotic.  In  the  following  section  we  will  briefly  discuss  a  method 
designed  to  characterize  chaotic  behavior  of  physical  systems  [6-10]. 
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Fig.  6.6  Numerical  solution  of  the  double  pendulum  with  initial  conditions  (p\  (0)  =  ^(O)  =  0-0, 
P\ (0)  =  0.0  and /?2(0)  =  6.5.  (a)  Trajectory  in  <p-space,  (b)  trajectory  in p- space,  and  (c)  trajectory 
in  configuration  space.  The  solid  circles  numbered  1  and  2  represent  the  two  masses  in  their  initial 
configuration  (The  angles  cp2  >  n  correspond  to  complete  rotations  of  the  pendulum) 


6.3  Numerical  Analysis  of  Chaos 

It  is  the  aim  of  this  section  to  analyze  in  more  detail  the  chaotic  behavior 
observed  in  the  dynamics  of  the  double  pendulum.  This  requires  the  introduction 
of  some  basic  notations.  We  consider  a  physical  system  with  /  degrees  of  freedom 
where  q\  (t), . . . ,  q/(t)  denote  the  generalized  coordinates  and p\(t), . . .  ,pp(t )  denote 
the  corresponding  generalized  momenta.  Together,  both  fully  characterize  the 
state  of  the  system  at  time  t.  Consequently,  the  /-dimensional  vector  q(t)  = 
[q\(t),  q2(t), . . .  ,qf(t)]T  describes  a  point  in  configuration  space  of  the  physical 
system.  In  case  of  a  pendulum  consisting  of  /  point-masses  connected  in  a 
similar  fashion  as  the  double  pendulum  discussed  above,  which  corresponds  to  the 
particular  case/  =  2,  the  configuration  space  is  constrained  to  values  cpt  e  (—71, 7r], 
i  —  1 ,...,/.  This  resembles  an /-dimensional  torus. 

The  2/-dimensional  vector  x(t)  —  [q\(t), . . . ,  qf(t),p\(t), . . .  ,p/(t)]T  describes 
a  point  in  the  phase  space  of  the  physical  system  at  some  particular  time  t.  The 
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time  evolution  of  a  physical  system  is  represented  by  its  phase  space  trajectory.  Of 
course,  the  phase  space  trajectories  x(t)  are  differentiable  with  respect  to  t. 

We  define  an  autonomous  system  as  a  system  which  is  time-invariant ,  i.e.  the 
Hamilton  function  H(x ,  t)  does  not  depend  explicitly  on  time  t,  H(x,  t)  =  H(x). 
Hence,  a  physical  system  is  referred  to  as  autonomous  if  the  Hamilton  function 
H(x,  t)  of  the  system  obeys 


9 

—H(x,  0  =  0. 
ot 


(6.25) 


Thus,  the  total  energy  is  conserved. 

An  autonomous  system  is  referred  to  as  integrahle  if  it  has  /  independent 
invariants  7j , . . . ,  If 


Ij(x)  —  Ij  =  const,  j  —  1 (6.26) 

One  of  these  is  the  energy.  Each  particular  invariant  /  reduces  the  dimension 
of  the  manifold  on  which  the  phase  space  trajectories  can  propagate.  Hence,  an 
integrable  system  propagates  on  an  /-dimensional  subspace  of  the  2/-dimensional 
phase  space.  We  note  that  a  one-dimensional  autonomous  system  is  integrable  since 
the  conservation  of  energy  delivers  the  required  invariant. 

On  the  other  hand,  non-integrable  systems  can  show  chaotic  behavior.  In  this 
case  the  trajectories  develop  a  strong  dependence  on  the  initial  conditions  which 
makes  an  analytic  calculation  of  the  dynamics  extremely  difficult.  However,  since 
the  trajectories  can  be  computed  without  problems  by  numeric  means,  we  discuss 
now  how  to  characterize  chaotic  behavior  on  the  computer. 

For  this  sake  we  investigate  the  dynamics  of  an  autonomous  Hamiltonian  system 
starting  with  one  of  two  initial  conditions,  namely  xq  and  Tq.  Then  the  system  arrives 
at  time  t  at  the  phase  space  points  x(t)  —  cpt(x o)  and  x'(t)  —  <^(vq),  respectively,  as 
a  solution  of  Hamilton’s  equations  of  motion.  Here  cpt(x o)  denotes  the  flow  of  the 
system  as  defined  in  Sect.  5.4.  Since  the  trajectories  in  a  chaotic  system  strongly 
depend  on  the  initial  conditions  xq  and  Vq  we  introduce  the  separation  between 
the  two  trajectories  cpt(x o)  and  (pt(x'0 )  at  time  t  as  d(t)  —  \<flt(xo)  —  ^(tq) |  where 
•  |  denotes  some  suitable  norm.  This  length  can  now,  for  instance,  be  used  to 
characterize  the  stability  of  the  trajectory  cpt(x o)  [11].  In  particular,  a  solution  cpt(x o) 
is  referred  to  as  stable  if 

Ye  >  0  38(e)  >0:Vx0:  d(  0)  <  8  =*  d(t)  <e ,  Vt  >  0  .  (6.27) 

In  words:  We  speak  of  a  stable  solution  if  the  trajectory  (pt(x'0)  which  corresponds 
to  the  perturbed  initial  condition  x'0  stays  within  a  tube  of  radius  e  around  the 


2  The  symplectic  mapping  (pt  :  x o  ^  x(t)  from  the  initial  conditions  xo  to  the  phase  space  point 
xfi)  at  time  t  is  referred  to  as  Hamiltonian  flow  of  the  system.  This  was  discussed  in  Sect.  5.4. 
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unperturbed  trajectory  (pt{x o)  for  all  t  >  0.  Alternatively,  a  solution  is  referred 
to  as  asymptotically  stable  if  the  distance  to  adjacent  trajectories  tends  to  zero, 
i.e.  d(t)  ->  0  as  t  ->  oo.  Such  solutions  tend  to  attract  trajectories  from  their 
neighborhood  and,  hence,  they  are  referred  to  as  attractors.  Finally,  a  periodic  orbit 
is  defined  as  a  trajectory  for  which  one  can  find  a  time  r  such  that: 

(pT(x)  =  x  ,  Vx.  (6.28) 

To  find  an  easy  answer  to  the  question  whether  or  not  a  particular  solution 
of  a  non-integrable  system  is  stable,  the  clear,  topological  method  of  Poincare 
maps  was  introduced.  The  idea  was  to  reduce  the  investigation  of  the  complete 
2/-dimensional  phase  space  trajectory  x{t)  —  <pt(x o)  to  the  investigation  of  its 
intersection  points  through  a  plane  E  which  is  transverse  to  the  flow  of  the  system. 
This  plane  is  a  subspace  of  dimension  2/  —  1  and  is  commonly  referred  to  as 
Poincare  section  [3].  The  transversality  of  the  Poincare  section  E  means  that 
periodic  flows  intersect  this  section  and  never  flow  parallel  to  or  within  it. 

Consider  a  trajectory  which  is  bound  to  a  finite  domain,  i.e.  it  does  not  tend 
to  infinity  in  any  phase  space  coordinate.  In  this  case  it  is  possible  to  define  the 
Poincare  section  in  such  a  way  that  the  trajectory  will  intersect  this  section  not 
only  once  but  several  times.  Thus,  a  POINCARE  map  is  then  the  mapping  of  one 
intersection  point  P  onto  the  next  intersection  point  P'. 

Let  us  substantiate  this  idea:  we  consider  the  initial  condition  xo  for  which  a 
trajectory  r  is  periodic.  We  choose  the  initial  time  t  =  0  in  such  a  way  that  xo  e  F, 
where  E  is  the  Poincare  section,  Fig.  6.7.  We  suppose  that  after  a  time  r(xo) 
the  trajectory  intersects  this  POINCARE  section  again.  Since  we  demanded  that  the 
trajectory  which  started  in  xo  is  periodic,  we  deduce  that  it  intersects  the  POINCARE 
section  again  at  some  point  ^r(*0)(x o)  =  *o-  We  consider  now  a  slightly  perturbed 
initial  condition  x'  e  £/o(xo)>  where  £/o(xo)  is  referred  to  as  the  neighborhood  of  xo. 
In  this  case  the  trajectory  will  in  general  not  be  periodic,  and  the  next  intersection 
point  ^r(x/)(x/)  ^  x' .  The  mapping  from  one  intersection  point  xr  onto  the  next 
intersection  point  <^r(x ')(x')  is  called  the  POINCARE  map  P(x')  —  ^T(x/)(x/).  We  note 
that  the  particular  point  xo  is  a  fixed  point  of  this  mapping,  P(xo)  =  xo.  Furthermore, 
we  note  that  if  xf  e  Uo(xo)  we  will  have  P(x')  e  U \  (xo) ,  where  U i(xo)  is  the 
neighborhood  of  first  return.  This  is  indicated  schematically  in  Fig.  6.7. 


3  Note  that  we  denoted  r  =  r(xo)  in  order  to  emphasize  that  the  recurrence  time  r  will  depend  on 
the  initial  condition  xq. 
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Fig.  6.7  Schematic 
illustration  of  the 
neighborhood  Uq(xo)  and  the 
neighborhood  of  first  return 
U\  (xo)  of  a  periodic 
trajectory  F.  The  intersection 
point  xo  of  r  with  E  is  a 
hxed  point  of  this  mapping 


We  utilize  now  these  concepts  and  analyze  the  dynamics  of  the  double  pendulum. 
We  have  four  generalized  coordinates  which,  with  the  help  of  conservation  of 
energy,  are  constrained  to  a  three-dimensional  manifold  within  the  four-dimensional 
phase  space.  Since  the  investigation  of  these  three-dimensional  trajectories  is  very 
complex  we  consider  a  two-dimensional  POINCARE  section.  For  instance,  the 
coordinates  [(pi(t),p\(t)]T  can  be  ‘measured’  whenever  ^ (0  =  0  and p2  >  0.  Thus, 
the  system’s  state  is  registered  whenever  mass  2  crosses  the  vertical  plane  from  the 
left-hand  side. 

We  discuss  now  some  of  the  most  typical  scenarios  for  POINCARE  plots.  (Such 
a  plot  represents  the  Poincare  section  together  with  all  intersection  points  of 
a  particular  trajectory.)  Note  that  this  discussion  is,  of  course,  not  restricted  to 
the  case  of  the  double  pendulum.  Two  different  scenarios  can  be  distinguished 
for  integrable  systems:  (i)  the  set  of  intersection  points  (771 ,  rj2, . . . ,  tjn)  is  finite, 
(ii)  In  the  more  general  case,  the  dimension  N  of  the  set  of  intersection  points  is 
infinite.  In  both  cases  the  intersection  points  form  one-dimensional  lines  which 
do  not  have  to  be  connected.  Figure  6.8a,  b  discuss  this  schematically.  However, 
if  the  system  is  non-integrable,  a  third  scenario  is  possible:  chaotic  behavior.  In 
this  case  the  intersection  points  appear  to  be  randomly  distributed  on  the  two- 
dimensional  Poincare  section  and  one  observes  space-filling  behavior.  This  is 
illustrated  schematically  in  Fig.  6.8c.  Whether  one  observes  chaotic  behavior  or  not 
depends  on  the  choice  of  the  initial  conditions. 
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Fig.  6.8  Schematic  illustration  of  the  three  types  of  Poincare  plots  as  discussed  in  the  text,  (a) 
Finite  number  of  intersection  points,  (b)  infinite  number  of  intersection  points  which,  however, 
form  closed  lines,  (c)  space-filling  and,  consequently,  chaotic  behavior 
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Fig.  6.9  Poincare  plot  of  the  double  pendulum  with  initial  conditions  <^i(0)  =  (p2{ 0)  =  0.0, 
Pi(0)  =  4.0  and  p2 (0)  =  2.0.  It  corresponds  to  the  situation  discussed  in  Fig.  6.2 


In  Figs.  6.9,  6.10,  and  6.1 1  we  present  Poincare  plots  of  the  double  pendulum. 
The  graphs  were  obtained  with  help  of  the  method  discussed  above,  i.e.  (p2  —  0  and 
P2  >  0.  Again,  we  set  m  —  l  —  1  and  g  —  9.8067.  The  time  step  was  chosen  to 
be  At  —  0.001  and  we  calculated  N  —  36  x  104  time  steps.  In  Figs.  6.9  and  6.10 
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Fig.  6.10  Poincare  plot  of  6 

the  double  pendulum  with 
initial  conditions  4 

<Pi(0)  =  1.0,<p2(0)  =  o.o, 

=  0.0  and p2{ 0)  =  3.0. 

It  corresponds  to  the  situation 
discussed  in  Fig.  6.3 
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Fig.  6.11  Poincare  plot  of 
the  double  pendulum  with 
initial  conditions 

<M°)  =  <?2(0)  =  0.0, 

Pi(0)  =  0.0  and /?2(0)  =  4.0. 

It  corresponds  to  the  situation 
discussed  in  Fig.  6.4 
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we  observe  regular  behavior  as  it  was  illustrated  in  Fig.  6.8b.  In  Fig.  6.1 1  the  points 
are  space  filling  and,  consequently,  chaotic  behavior  is  observed  in  this  particular 
case.  Keeping  in  mind  that  this  particular  POINCARE  plot  refers  to  the  initial  value 
problem  of  Fig.  6.4  we  conclude  that  all  problems  of  this  series,  i.e.  Figs.  6.4,  6.5, 
and  6.6,  are  non-integrable  and  chaotic. 


Summary 

The  dynamics  of  the  double  pendulum  is  described  by  a  system  of  four  ordinary 
first  order  differential  equations.  It  is  a  typical  initial  value  problem  and,  thus, 
the  methods  introduced  in  Chap.  5  are  all  candidates  to  find  a  numerical  solution. 
Here  we  concentrated  on  the  explicit  Runge-Kutta  algorithm  e-RK-4  of  Sect.  5.3. 
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Solutions  were  studied  in  detail  for  several  classes  of  initial  conditions.  One  of  the 
results  was  that  rather  small  changes  of  the  initial  conditions  could  result  in  rather 
strong,  chaotic  reactions  of  the  outer  mass.  This  triggered  the  obvious  question  about 
the  stability  of  a  numerical  analysis  and  of  physical  dynamics  in  general.  While  the 
stability  of  numerical  methods  has  already  been  discussed  in  Chap.  1  we  focused 
here  on  the  chaotic  behavior  of  Hamiltonian  systems.  Consequently,  a  short  section 
on  the  numerical  analysis  of  chaos  was  added.  It  contained  the  most  important 
concepts  and  in  particular  the  concept  of  the  stability  of  a  phase  space  trajectory 
against  variation  of  initial  conditions.  Finally,  the  importance  of  POINCARE  plots 
in  recognizing  whether  a  system  is  integrable  or  non-integrable  was  explained. 
Non-integrable  systems  can  develop  chaotic  behavior.  Thus,  Poincare  plots  are 
an  important  tool  to  study  chaos  in  mechanics. 


Problems 

1.  Verify  Hamilton’s  equations  of  motion  derived  in  Sect.  6.1.  Implement  the 
e-RK-4  algorithm  discussed  in  Sects.  5.3  and  6.2  to  integrate  the  equations  of 
motion.  Plot  the  trajectories  for  various  initial  conditions.  Use  the  examples 
illustrated  in  Sect.  6.2  to  check  the  code. 

2.  Produce  Poincare  plots  by  plotting  (<pi,pi)  whenever  cp2  =  0  and  p2  >  0. 
The  condition  <^2  =  0  is  substituted  by  |^2|  <  e  in  the  numerically  realization. 
Note  that  if  the  points  are  space  filling  the  dynamics  are  chaotic,  as  discussed  in 
Sect.  6.3.  Try  to  find  different  initial  conditions  which  result  in  regular  behavior 
and  different  initial  conditions  which  produce  chaotic  dynamics. 

3.  Let  x(t)  =  [<pi(t),<p2(t),pi(t),p2(t)]T  andx'O)  =  [<p[(t),  <p'2(t),p[(t),p'2(t)]T  be 
two  trajectories  which  correspond  to  different  initial  conditions  xo  and  x'0.  In  this 
case  the  distance  between  trajectories  is  defined  as 

d{t)  =  ^[n(t)  -  p  +  [n(t)  -  <p'2(t] )p  +  [p,(0 -P\(t) p  +  \p2(t) -pm1- 

Plot  the  distance  d(t)  as  a  function  of  time  t  for  two  different  initial  conditions. 

4.  Extend  the  code  of  the  double  pendulum  of  equal  mass  and  equal  length  to  cover 
the  case  when  the  lengths  and  masses  of  the  individual  pendula  are  different, 
t\  ^  I2  and  m\  ^  ra2.  What  happens?  For  instance,  one  can  choose  a  certain 
initial  condition  and  keep  l\  and  f2  fixed.  What  is  the  influence  of  l\  and  f2  on 
the  dynamics? 

5.  Show  that  the  dynamics  become  integrable  in  the  absence  of  a  gravitational  force, 
i.e.  g  =  0.  What  are  the  conserved  quantities?  How  do  the  Poincare  plots  look 
like?  Again,  try  different  initial  conditions. 
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Chapter  7 

Molecular  Dynamics 


7.1  Introduction 

It  is  the  aim  of  many  branches  of  research  in  physics  to  describe  macroscopic 
properties  of  matter  on  the  basis  of  microscopic  dynamics.  However,  a  description 
of  the  simultaneous  motion  of  a  large  number  of  interacting  particles  is  in  most  cases 
not  feasible  by  analytic  methods.  Moreover,  a  description  is  particularly  difficult  if 
the  interaction  between  the  particles  is  strong.  Within  the  framework  of  statistical 
mechanics  one  tries  to  remedy  these  difficulties  by  employing  some  simplifying 
assumptions  and  by  treating  the  system  from  a  statistical  point  of  view  [1-4]. 
However,  most  of  these  simplifying  assumptions  are  only  justified  within  certain 
limits,  such  as  the  weak  coupling  limit  or  the  low  density  limit.  Nevertheless,  it  is 
not  easy  to  establish  how  the  solutions  acquired  are  influenced  by  these  limits  and 
how  the  physics  beyond  these  limits  can  be  perceived.  This  makes  the  necessity  of 
numerical  solutions  quite  apparent.  There  are  essentially  two  methods  to  determine 
physical  quantities  over  a  restricted  set  of  states,  namely  molecular  dynamics  [5-7] 
and  Monte  Carlo  methods.  The  technique  of  molecular  dynamics  will  be  discussed 
within  this  chapter  while  an  introduction  into  some  basic  features  of  Monte  Carlo 
algorithms  is  postponed  to  the  second  part  of  this  book. 

We  strictly  focus  on  a  particular  sub-field  of  molecular  dynamics,  namely  on 
classical  molecular  dynamics ,  i.e.  the  treatment  of  classical  physical  systems. 
Extensions  to  quantum  mechanical  systems,  which  are  commonly  referred  to  as 
quantum  molecular  dynamics  [8],  will  not  be  discussed  here. 
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7.2  Classical  Molecular  Dynamics 


The  classical  model  system  for  molecular  dynamics  consists  of  N  particles  with 
positions  rt  =  rt(t ),  velocities  Vi  =  Vi(t)  —  u(t)  and  masses  m*,  where  i  — 
1,2  We  note  that  rt  and  Vi  are  vectors  of  the  same  dimension.  We  can  write 

Newton’s  equations  of  motion  as 

mh  —  fi(r\ ,  7*2, ,  rN)  ,  (7.1) 

where  we  introduced  the  forces  ft  =  fi(r\,  r2, ... ,  rN).  Again,  we  note  that  the 
forces  fi  are  vectors  of  the  same  dimension  as  rt  and  vt.  We  specify  the  forces  ft 
by  demanding  them  to  be  conservative.  Thus,  we  write 

fi(r i ,  r2, . . . ,  rN)  =  -V/£/(ri,  r2, . . . ,  rN)  ,  (7.2) 


where  V;  is  the  gradient  pertaining  to  the  spatial  components  of  the  z-th  particle 
and  U(r\ ,  r2 , . . . ,  rN)  is  some  potential  which  we  will  abbreviate  by  dropping  its 
arguments:  U  =  U(r\,  r2, . . .  rN).  We  then  specify  this  potential  U  as  the  sum  of 
two-particle  interactions  Uy  and  some  external  potential  UQX t  as,  for  instance,  the 
gravitational  field  or  a  static  electric  potential  applied  to  the  system: 

U  =  \  E  E  ^  +  ^xt  •  (7.3) 

1  j^i 


In  our  discussion  of  the  two-body  problem  (Appendix  A)  and,  in  particular, 
of  the  Kepler  problem  in  Chap.  4  we  considered  a  central  potential,  which  was 
proportional  to  —  1/r.  Due  to  the  conservation  of  angular  momentum,  it  was 
convenient  to  introduce  an  effective  potential  UQff  as  the  sum  of  an  attractive  and 
repulsive  part  as  it  was  defined  in  Eq.  (4.3)  and  illustrated  in  Fig.  4.1.  In  contrast,  in 
molecular  dynamics  the  most  prominent  two-body  interaction  potential  is  known  as 
the  Lennard-Jones  potential  [9].  It  is  of  the  form 


U(\r\)  —  4o 


(7.4) 


where  e  and  a  are  real  parameters  and  \r\  is  the  distance  between  two  particles. 
The  significance  of  the  parameters  €  and  a  as  well  as  the  form  of  t/(|r|)  defined  by 
Eq.  (7.4)  is  illustrated  in  Fig.  7.1.  The  Lennard-Jones  potential  was  particularly 
developed  to  model  the  interaction  between  neutral  atoms  or  molecules.  The 
repulsive  term,  which  is  proportional  to  |r|-12,  describes  the  PAULI  repulsion  while 
the  attractive  \r\~6  term  accounts  for  attractive  VAN  DER  WAALS  forces. 
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Fig.  7.1  Illustration  of  the 
Lennard-Jones  potential, 
Eq.  (7.4).  cr  describes  the 
depth  of  the  potential  well 
and  e  is  the  position  of  the 
root  of  the  Lennard-Jones 
potential 


We  introduce  the  distance  between  particles  i  and  j  via 


and  define  the  two-body  potential 


(7.5) 


Uij  =  U(rv) 


(7.6) 


where  U  is  approximated  by  the  Lennard-Jones  potential  (7.4).  Furthermore,  we 
deduce  from  Eq.  (7.4)  that 


/(M) 


— Vrt/(|r|) 


24a 


(7.7) 


where  we  keep  in  mind  that  r  is  a  vector.  Hence,  we  write  the  forces/  which  appear 
in  Newton’s  equations  of  motion  (7.1)  with  the  help  of  (7.3)  in  the  form 


ft  =  -y,u 


=  -V;  - 


U kl  +  f^ext 


k  l^k 


-  VC/y  -  WiU, 


ext 


2>*>  +/e: 

j^i 


ext 


y'.fij  +  /ext  > 


(7.8) 
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where  we  implicitly  defined  the  external  force  /e*xt  acting  on  particle  i  and  the  two- 
particle  forces  fij  acting  between  particle  i  and  j.  We  want  to  make  the  road  visible, 
which  guides  us  to  a  numerical  solution  of  Newton’s  equations  of  motion  (7.1), 
and  introduce  the  vectors  R  =  (r\,  r2, . . . ,  rN)T ,  V  =  (v\,  i>2, . . . ,  VnY  —  R ,  and 
F  =  (/i/mi,/2/m2, . . .  Jn/^nY •  This  transforms  Eq.  (7.1)  into  the  very  compact 
form 


R  —  F 


9 


(7.9) 


which  is  equivalent  to  a  set  of  two  first  order  ordinary  differential  equations: 


(7.10) 


This  set  is  already  of  the  standard  form  (5.1)  of  initial  value  problems. 

We  are  now  in  a  position  to  proceed  with  a  discussion  of  some  numerical  methods 
which  have  been  developed  in  Chap.  5  to  solve  this  initial  value  problem.  For  this 
sake,  we  regard  discrete  time  instances  4  =  kAt ,  where  he  N  and  function  values 
at  these  discrete  time  instances  4  are  denoted  by  a  subscript  k,  as  for  instance  74  = 

^(4)- 

(i)  In  a  first  approximation  we  apply  the  symplectic  Euler  method  [see 
Eq.  (4.33)]  to  Eq.  (7.10)  and  obtain 


VVi+J 


Inserting  the  second  into  the  first  equation  results  in 

^4+i  —  Rjc  +  VkAt  +  F^At2  . 


(7.11) 


(7.12) 


The  velocity  14  at  time  4  is  then  approximated  by  the  backward  difference 
derivative  (2.10b)  and  we  find  the  recursion  relation: 

^4+i  =  2 74  —74- 1  +  FkAt 2  .  (7.13) 


We  note  that  it  is  only  valid  for  k  >  1.  The  initialization  step  necessary  to 
complete  the  analysis  is  found  by  expanding  74  in  a  TAYLOR  series  up  to 
second  order: 


Ri  =  Ro  +  AtV0  +  X-F0At2  .  (7.14) 

This  method  is  referred  to  as  the  Stormer-Verlet  algorithm  [10].  Note 
that  Eq.  (7.14)  serves  as  the  initialization  of  the  sequence  of  time  steps. 
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Furthermore,  we  remark  that  Eq.  (7.13)  could  have  also  been  obtained  using 
the  central  difference  derivative  to  approximate  the  second  time  derivative  in 
Eq.  (7.1): 


Rk+ 1 


2Rk  +  Rjc-  i 

At2 


(7.15) 


In  summary,  the  Verlet  or  Stormer-Verlet  algorithm  is  defined  by  the 
following  set  of  equations: 


Rk+ 1  =  2 Rk  —  Rk- i  +  FkAt 2  ,  k  >  1, 

=  7?o  +  AtVo  +  -FoAt2  .  (7.16) 

(ii)  We  employ  the  central  rectangular  rule  of  integration  (Sect.  3.2)  in  order 
to  obtain  approximations  which  are  formally  equivalent  to  Eq.  (5.11).  In 
particular,  we  obtain  from  Eq.  (7.10): 


Rk+i  —  Rk  +  Vk+iAt .  (7.17) 

We  note  that  the  value  of  14+ 1/2  is  yet  undetermined.  However,  it  can  be 
determined  in  a  similar  fashion  via 

Vk+i  =  Vk_h+FkAt.  (7.18) 

This  method  is  referred  to  as  the  leap-frog  algorithm  and  is  initialized  by  the 
relation 


Vi  =  V0  +  .  (7.19) 

2  2 

This  equation  can  also  be  obtained  by  expanding  V1/2  in  a  Taylor  series  up 
to  first  order  around  the  point  to  =  0  and  by  noting  that  W  =  ^4-  In  summary 
we  write  the  leap-frog  algorithm  as 


^4+i  —  Rk  +  Vk+\_At , 
Vk+\  =  Vk-\  +  FkAt  , 

Vi  —  Vo  +  —FoAt . 

2  2 


(7.20) 
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(iii)  A  third,  very  elegant  alternative  is  the  so-called  velocity  Verlet  algorithm. 
We  expand  74+  E 


74+i  —7 4  +  VkAt  +  —FfrAt2  .  (7.21) 

This  allows  to  calculate  the  spatial  coordinates  at  time  4+1  if  74  and  V&  are 
given.  Note  that  F \  =  E(74)  is  completely  determined  by  the  positions  74- 
Nevertheless,  we  need  one  more  relation  in  order  to  determine  the  velocities 
at  times  4+1.  Again,  we  expand  I4+i  in  a  Taylor  series.  However,  we 
approximate  the  remainder  by  the  arithmetic  mean  between  4  and  4+1 : 

Vk+ 1  —  Vk  +  —  (74  +  74+ 1)  At .  (7.22) 

The  strategy  is  clear:  we  calculate  the  positions  74+ 1  from  Eq.  (7.21)  for  given 
values  of  74  and  14 .  With  the  help  of  74+ 1  we  compute  74+ 1,  which  is  then 
inserted  into  Eq.  (7.22)  which  determines  14+ 1.  In  summary,  the  complete 
algorithm  of  the  velocity  Verlet  method  is  defined  by  the  steps: 

74+i  —  74  +  VkAt  +  —FfrAt2  , 

V*+ 1  =  V*  +  ^  (74  +  74+0  At .  (7.23) 

We  note  some  properties  of  these  methods.  The  Stormer- Verlet  algorithm 
of  Eq.  (7.16)  is  time-reversal  symmetric  (invariant  under  the  transformation  At  -> 
—At),  hence  reversible.  This  is  a  direct  consequence  of  its  relation  to  the  symplectic 
Euler  method.  Moreover,  the  positions  74  obtained  with  this  method  are  highly 
accurate,  however,  the  procedure  suffers  under  an  inaccurate  approximation  of  the 
velocities  14.  This  shortcoming  is  clearly  remedied  by  the  leap-frog  algorithm  (7.20) 
or  the  velocity  Verlet  algorithm  (7.23).  However,  these  methods  are  not  time- 
reversal  invariant.  Hence,  one  has  to  decide  whether  or  not  very  accurate  values 
for  the  velocities  are  required  for  the  problem  at  hand.  In  many  cases  the  velocity 
Verlet  algorithm  is  the  most  popular  choice. 


7.3  Numerical  Implementation 

The  rough  structure  of  a  molecular  dynamics  code  consists  of  three  crucial  steps, 
namely 

•  Initialization, 

•  start  simulation  and  equilibrate, 

•  continue  simulation  and  store  results. 
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In  the  following  we  discuss  some  of  the  most  important  subtleties  associated  with 
these  three  parts.  In  particular  we  will  focus  on  the  choice  of  appropriate  boundary 
conditions  and  on  the  choice  of  the  scales  of  characteristic  quantities. 


Boundary  Conditions 

Basically,  there  are  two  possibilities:  (i)  The  system  is  of  finite  size  and  the 
implementation  of  boundary  conditions  might  be  straightforward.  For  instance,  let 
us  assume  that  we  regard  N  particles  within  a  finite  box  of  reflecting  boundaries, 
we  simply  propagate  the  particle-coordinates  in  time  and  if  a  particle  tries  to  leave 
the  box,  we  correct  its  trajectory  according  to  a  reflection  law.  The  velocity  is 
adjusted  accordingly.  This  is  illustrated  in  Fig.  7.2  for  a  two-dimensional  case  and 
the  particular  situation  that  the  particle  is  reflected  from  the  right  hand  boundary  of 
the  box.  The  corresponding  equations  read 


L  (?Ck-\- 1  T) 
%+ 1 


(7.24) 


and 


(7.25) 


Fig.  7.2  Illustration  of  the 
reflection  principle  for  a  box 
of  finite  dimension  with 
reflecting  boundaries 


Ay 


(Xk+l'Vk4  ' 


(wyk+i) 


»yk) 


L 


X 
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Here,  L  denotes  the  length  of  the  box  and  Xk+\,  yk+u  Vk+  \,x  and  Vk+i,y  are  the 
positions  and  velocities  one  would  have  obtained  in  the  absence  of  the  boundary, 
see  Fig.  7.2. 

(ii)  The  system  is  not  confined.  Then  the  situation  is  entirely  different.  Of  course, 
one  could  approximate  the  infinite  volume  by  a  large  but  finite  volume.  In  such  a 
case  the  influence  of  a  constraint  to  finite  size  is  usually  not  negligible.  The  induced 
errors  are  referred  to  as  finite  volume  effects.  A  very  popular  choice  are  so  called 
periodic  boundary  conditions  which  means  that  a  finite  system  is  surrounded  by  an 
infinite  number  of  completely  identical  replicas  of  the  system,  where  the  forces 
are  allowed  to  act  across  the  boundaries.  Because  of  this,  calculating  the  force 
on  one  particle  requires  the  evaluation  of  an  infinite  sum.  This  is  numerically  not 
manageable  and  we  have  to  find  ways  to  truncate  the  sum.  For  instance,  it  might 
be  a  good  approximation  to  restrict  the  sum  to  nearest-neighbor  cells.  However,  the 
applicability  of  such  an  approach  highly  depends  on  the  properties  of  the  system 
under  investigation  and,  in  particular,  on  the  range  of  the  interaction  potential. 
In  case  of  a  Lennard -Jones  potential  the  quantity  defining  the  range  of  the 
interaction  potential  is  £,  see  Fig.  7.1. 

If  a  particle  leaves  the  box,  it  enters  the  box  at  the  same  time  on  the  opposite 
side.  More  generally,  due  to  the  requirement  of  identical  replicas,  we  have  for  all 
observables  0(r )  that  0(r  +  nK )  =  0(r ),  where  r  lies  within  the  central  box,  K  is 
a  lattice  vector  pointing  to  one  of  the  neighboring  cells  and  n  e  Z. 

There  is  another  crucial  point  concerning  periodic  boundary  conditions.  In  case 
of  a  closed  system,  the  system  is  definitely  at  rest.  However,  if  periodic  boundary 
conditions  are  imposed  it  is  possible  that  the  particles  move  with  constant  velocity 
from  one  cell  to  another,  which,  in  our  case,  resembles  circling  trajectories.  This  is 
definitely  not  desirable  since  the  total  velocity  is  a  measure  of  the  kinetic  energy 
and,  therefore,  of  the  temperature  of  the  system.  However,  one  can  shift  the  total 
velocity  in  order  to  remedy  this  problem.  In  particular,  if 

N 

vtot  =  ^  Vi  ^  0  ,  (7.26) 

i=  1 


the  shift 


1 

Vi  ~  —  Vtot  , 


(7.27) 


yields  the  desired  result.  We  note  that  in  a  case  where  all  masses  are  identical,  i.e. 
m\  —  m2  —  ...  —  mjy  =  m,  this  is  equivalent  to  ptot  —  mvto t  =  0. 

In  conclusion,  we  remark  that  the  choice  of  boundary  conditions  is  not  the  only 
item  to  be  considered  in  the  definition  of  the  system.  Another  quite  crucial  point 
might  be  the  size  of  the  box.  If  an  infinite  system  is  modeled  using  finite  systems, 
the  dimension  of  the  box  must  fairly  exceed  the  mean  free  path  of  the  particles. 
Otherwise,  the  influence  of  the  boundaries  is  going  to  perturb  significantly  the 
outcome  of  the  numerical  experiment. 
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Initialization  and  Equilibration 


We  remember  from  statistical  physics  [1-4]  that  every  degree  of  freedom  in  the 
system  (7.1)  contributes  just  k^T/2  to  the  total  kinetic  energy  because  of  the 
equipartition  theorem.  Here  is  Boltzmann’s  constant  and  T  is  the  temperature. 
If  we  regard  N  particles,  which  move  in  a  d-dimensional  space,  we  have  d(N  —  1) 
degrees  of  freedom,  if  we  demand  that  vtot  —  0.  Hence,  we  have 

1  ^  2  d(N  —  1)  o  q\ 

Ekin  =  =  - 2 - ’  (J2%) 

i=  1 

which  gives  a  relation  from  which  we  can  determine  the  temperature  of  the  system: 


1 

d(N  —  1) 


N 

J2  MiVi  ■ 


i=  1 


(7.29) 


However,  in  many  applications  the  system  is  supposed  to  be  simulated  at  a  given 
temperature,  i.e.  the  temperature  T  is  an  input  rather  than  an  output  parameter  and 
is  supposed  to  stay  constant  during  the  simulation.  We  can  control  the  temperature 
by  rescaling  the  velocities  and  this  might  be  necessary  at  several  times  during  the 
simulation  in  order  to  guarantee  a  constant  temperature.  We  define 


v':  =  A  Vi , 


(7.30) 


where  A  is  a  rescaling  parameter.  The  temperature  associated  with  the  velocities  v[ 
is  given  by 


A2 

d(N  —  1) 


N 


J2  m‘v <2  • 

i=  1 


(7.31) 


This  allows  to  determine  how  to  choose  A  in  order  to  obtain  a  certain  temperature 
T'\ 


A  =  J ■  (7.32, 

V  Z/ikin 

We  note  that  if  the  total  velocity,  which  is  the  sum  of  all  velocities  is  zero,  the 
total  velocity  corresponding  to  the  rescaled  velocities  v[  is  also  equal  to  zero  since 


N 


N 


v't  = A  Y. vt  = 0 


i=  1 


i=  1 


(7.33) 
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This  ensures  that  rescaling  of  the  velocities  does  not  induce  a  bias. 

The  choice  of  the  initial  conditions  highly  influences  the  time  the  system  needs 
to  reach  thermal  equilibrium.  For  instance,  if  a  gas  is  to  be  simulated  at  a  given 
temperature  T  it  might  be  advantageous  to  choose  the  initial  velocities  according  to 
a  Maxwell-Boltzmann  distribution.  The  Maxwell-Boltzmann  distribution 
states  that  the  probability  [more  precisely:  the  pdf  ( probability  density  function 
describing  the  probability,  see  Appendix  E)]  that  a  particle  with  mass  m  has  velocity 
v  is  proportional  to 


AM)  oc 


V 


exp 


(7.34) 


Another  intriguing  question  is  how  to  check  whether  or  not  thermal  equilibrium 
has  been  reached.  In  statistical  mechanics  one  is  usually  confronted  with  expectation 
values  of  observables  0(t)  as  a  function  of  time.  The  expectation  value  ( O }  is 
defined  as 


(O)  =  lim  -  [  dtO(t)  .  (7.35) 

r-^oo  r  J o 

Since  0(t)  is  not  known  analytically  one  replaces  the  mean  value  by  its  arithmetic 
mean 


j  k-\-n 

(0)^0=-  J2  0(tj )  .  (7.36) 

n  j=k+ 1 

If  n  and  k  are  sufficiently  large,  the  average  value  can  be  regarded  as  converged. 
In  particular,  one  has  to  choose  n  reasonably  large  and  then  find  k  in  such  a  way, 
that  for  all  values  k'  >  k  the  same  result  for  O  is  obtained.  Hence,  equilibrium  has 
been  reached  after  k  time-steps  and  it  is  now  possible  to  ‘measure’  the  observables 
by  calculating  their  mean  values.  A  more  detailed  discussion  of  such  a  procedure, 
as,  for  instance,  the  influence  of  time  correlations  or  a  discussion  of  more  advanced 
techniques  is  postponed  to  Chap.  19. 

There  is  one  last  point:  In  many  cases  the  natural  units  of  the  physical  system 
might  be  disadvantageous  because  they  are  likely  to  induce  numerical  instabilities. 
In  such  cases  a  common  technique  is  to  switch  to  rescaled  variables  by  introducing 
new  units,  which  are  characteristic  quantities  for  the  system  and  all  physical 
quantities  are  expressed  in  these  new  units.  For  instance  one  might  introduce  the 
length  L  of  the  box  as  the  unit  of  space.  The  new  spatial  coordinates  would  then  be 
given  by 


(7.37) 
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Hence  all  coordinates  take  on  values  within  the  interval  r'  e  [0,  1].  However,  one 
cannot  introduce  an  arbitrary  set  of  characteristic  quantities  due  to  the  physical 
relations  they  have  to  obey.  For  instance,  one  might  introduce  a  characteristic 
energy  Eo ,  a  characteristic  length  A,  and  a  characteristic  mass  m.  In  this  case  the 
characteristic  temperature  T  is  determined  via 


Moreover  the  characteristic  time  r  is  fixed  to  the  value 


(7.38) 


r  = 


(7.39) 


which  results  from  the  relation  between  the  kinetic  energy  and  the  velocity. 

To  illustrate  a  molecular  dynamics  simulation  we  study  a  set  of  N  —  100  particles 
of  mass  m  —  1  which  are  subject  to  a  Lennard- Jones  potential  (7.4)  characterized 
by  €  —  g  —  1  and  to  a  gravitational  force  mg ,  g  =  9.81.  At  initialization  the 
particles  are  placed  in  a  10  x  10  lattice  starting  with  the  lower  left  hand  edge  at 
v  =  10.5  and  y  —  10.  The  particles  are  equally  spaced  with  Ax  —  Ay  —  e. 
This  initial  configuration  is  shown  in  Fig.  7.3a.  Furthermore,  the  left  hand  side,  the 
right  hand  side,  and  the  bottom  of  the  confinement  (L  =  30)  are  described  by 
reflecting  boundary  conditions,  Eqs.  (7.24)  and  (7.25).  The  confinement  is  open  at 
the  top,  i.e.  it  extends  to  infinity.  The  time  step  is  given  by  At  —  10-3.  Figure  7.3b- 
d  demonstrate  how  the  system  developed  after  1200,  1800,  and  3000  time  steps, 
respectively. 

This  chapter  closes  our  discussion  of  the  numerics  of  initial  value  problems.  In 
the  following  chapters  we  will  introduce  some  of  the  basic  concepts  developed  to 
solve  boundary  value  problems  with  numerical  methods. 


Summary 

This  chapter  dealt  with  the  classical  dynamics  of  many  particles  (not  neces¬ 
sarily  identical  particles)  which  are  confined  in  a  box  of  finite  dimension  or 
which  are  allowed  to  roam  freely  in  infinite  space.  The  particles  are  subject 
to  a  particle-particle  interaction  and  to  an  external  force.  The  discussion  was 
restricted  to  classical  molecular  dynamics.  From  Newton’s  equations  of  motion 
for  N  interacting  particles  numerical  methods  were  developed  which  allowed  the 
simulation  of  the  particles’  dynamics.  Based  on  the  symplectic  Euler  method  the 
Stormer-Verlet  algorithm  was  derived.  Another  approach  was  based  on  the 
central  rectangular  rule  and  resulted  in  the  leap-frog  algorithm.  Finally,  the  velocity 
Verlet  algorithm  was  introduced.  All  three  methods  do  have  their  merits.  The 
first  gives  very  accurate  results  for  the  particles’  positions  but  calculates  inaccurate 
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Fig.  7.3  (a)  Initial  configuration:  the  particles  are  placed  in  a  10  x  10  equally  spaced  lattice  starting 
withx  =  10.5  and  y  =  10.0;  g  =  9.81.  The  initial  velocities  are  equal  to  zero,  (b)  Configuration 
after  1200  time  steps,  (c)  Configuration  after  1800  time  steps,  (d)  Configuration  after  3000  time 
steps 


velocities.  It  has  the  advantage  that  it  is  time  reversible.  The  other  two  methods 
lack  this  property  but  give  very  accurate  estimates  of  the  particles’  velocities.  The 
final  part  of  this  chapter  was  dedicated  to  the  discussion  of  various  subtleties  of 
the  numerical  implementation  of  these  algorithms  as  there  were:  (i)  definition  of 
boundary  conditions,  (ii)  initialization  of  the  algorithm,  (iii)  equilibration  to  a  given 
temperature,  (iv)  ensuring  constant  temperature  throughout  the  simulation,  and  (v) 
transformation  to  rescaled  variables. 


Problems 

1.  We  investigate  the  pendulum  of  Chap.  1  and  write  its  equation  of  motion  as 

x  +  co  x  —  0  , 

with  co  —  yjg/t.  The  Stormer-Verlet  algorithm  is  applied  to  simulate  the 
pendulum’s  motion  and  to  compare  the  numerical  results  with  the  exact  solution. 


Problems 
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Demonstrate  that  the  result  is  very  sensitive  to  the  choice  of  the  time  step  At  and, 
in  particular,  of  the  product  coAt.  Note  that  in  this  particular  case  the  Stormer- 
Verlet  algorithm  can  also  be  studied  analytically!  What  happens  for  the  choice 
co  At  —  1  or  co  At  >  2?  Which  conclusions  can  be  drawn  from  this  example  for  a 
proper  choice  of  the  time  discretization? 

Try  the  other  two  methods  to  simulate  the  pendulum’s  dynamics. 

2.  Write  a  molecular  dynamics  code  with  the  help  of  the  following  instructions.  You 
can  use  either  the  leap-frog  or  the  velocity  Verlet  algorithm.  We  consider  the 
following  system: 

•  There  are  N  —  100  particles  in  a  two-dimensional  box  with  side  length 
L  —  30.  The  boundaries  at  the  bottom,  at  the  left-  and  at  right-hand  side 
are  considered  as  reflecting,  as  in  Fig.  7.2.  The  top  of  the  box  is  regarded  as 
open  (no  periodic  boundary  condition  or  reflecting  boundary  is  imposed). 

•  The  particles  interact  through  a  Lennard-Jones  potential  of  the  form  (7.4) 
where  e  and  a  define  the  interaction. 

•  Furthermore,  a  gravitational  force  Fext  =  —mgey  acts  on  each  particle,  where 
m  is  the  particle’s  mass,  g  is  the  acceleration  due  to  gravity,  and  ey  denotes  the 
unit  vector  in  y-direction. 

•  As  an  initial  condition,  the  particles  can  be  placed  within  the  box  on  a  regular 
lattice,  where  the  distance  between  the  particles  is  the  characteristic  distance 
according  to  the  Lennard-Jones  potential,  i.e.  e.  The  form  and  position  of 
this  lattice  is  arbitrary.  This  is  illustrated  in  Fig.  7.3. 

We  measure  the  velocities  and  the  positions  of  all  particles.  Since  the  particle’s 
velocities  and  positions  are  to  be  analyzed  with  the  help  of  an  extra  program,  the 
data  are  written  to  external  files  (it  is  not  necessary  to  save  all  time  steps!). 
Perform  the  following  analysis: 

•  Determine  the  temperature  T  from  the  kinetic  energy  as  discussed  in  in  this 
chapter.  Note  that  in  this  particular  case  we  do  not  demand  that  vtot  —  0! 

•  Try  different  initial  conditions.  For  instance,  set  the  initial  velocity  equal  to 
zero  and  stack  the  particles  in  different  geometric  configurations  (rectangle, 
triangle, . . . ,  one  can  also  use  more  than  one  configurations  at  the  same  time!). 
The  nearest  neighbor  distance  between  the  particles  can  be  set  equal  to  e. 
Choose  one  configuration  and  place  it  at  different  positions  in  the  box.  What 
happens? 

•  Set  e  —  g  —  m  —  1  (we  change  the  units)  and  set  in  the  initial  condition  to 
the  inter-atomic  distance  of  26  e.  (Why?)  Vary  the  gravitational  acceleration  g 
(different  systems  of  units)  in  order  to  simulate  different  states  of  matter.  The 
reference  program  developed  solid  behavior  for  g  &  0,  liquid  behavior  for 
g  %  0.1  and  gaseous  behavior  for  g  >  1.  Explain  this  behavior! 

•  Measure  the  particle  density  p(h)  as  a  function  of  the  height  h.  You  should  be 
able  to  reproduce  the  barometric  formula: 


p  oc  p0  exp{— yh/T},  y  >  0  . 
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•  Determine  the  momentum  distribution  (pi  =  mVi)  of  the  particles  and 
demonstrate  that  it  follows  a  Maxwell-Boltzmann  distribution 


p(\v\)  oc  \v\2 exp{— y\v\2/T} 


y  >  o , 


with  \  v\  =  y  v>l  +  the  Euclidean  norm. 
Illustrate  the  results  of  the  simulation  graphically. 
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Chapter  8 

Numerics  of  Ordinary  Differential  Equations: 
Boundary  Value  Problems 


8.1  Introduction 


It  is  the  aim  of  this  chapter  to  introduce  some  of  the  basics  methods  developed  to 
solve  boundary  value  problems.  Since  a  treatment  of  all  available  concepts  is  far 
too  extensive,  we  will  concentrate  on  two  approaches,  namely  the  finite  difference 
approach  and  shooting  methods  [1-5].  Furthermore,  we  will  strictly  focus  on  linear 
boundary  value  problems  defined  on  a  finite  interval  [a,  b]  C  R.  A  boundary  value 
problem  is  referred  to  as  linear  if  both  the  differential  equation  and  the  boundary 
conditions  are  linear.  Such  a  problem  of  order  n  is  of  the  form 


L[ y]  =/(*),  x€[a,b], 

Uv\y\  =  Av,  v  =  1 

Here,  L\y ]  is  a  linear  operator 

ft 

m  =  , 

k= 0 


(B.l) 


(8.2) 


where  y(A)(v)  denotes  the  &-th  spatial  derivative  of  y(v),  i.e.  y(k)  =  dky(x)/dxk  and 
f(x)  as  well  as  the  a^ix)  are  given  functions  which  we  assume  to  be  continuous. 
Accordingly,  linear  boundary  conditions  Uv  [y]  can  be  formulated  as 


l 

Uv\y]  =  [avky(k\a)  +  Pvky(k\b)]  =  Xv 

k=0 


(8.3) 
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where  the  avk,  Pvk  and  Ay  are  given  constants.  The  question  in  which  cases  a  solution 
to  the  boundary  value  problem  (8.1)  exists  and  whether  or  not  this  solution  will  be 
unique  [6],  will  not  be  discussed  here. 

Let  us  introduce  some  further  conventions:  The  differential  equation  in  the  first 
line  of  Eq.  (8.1)  is  referred  to  as  homogeneous  if  the  function /(v)  =  0  for  all 
x  G  [a,  b\.  In  analogy,  the  boundary  conditions  are  referred  to  as  homogeneous  if  the 
constants  Xv  =  0  for  all  v  =  1, . . . ,  n.  Finally,  the  boundary  value  problem  (8.1) 
is  referred  to  as  homogeneous  if  the  differential  equation  is  homogeneous  and  the 
boundary  conditions  are  homogeneous  as  well.  In  all  other  cases  it  is  referred  to  as 
inhomogeneous.  Moreover,  the  boundary  conditions  are  said  to  be  decoupled  if  the 
function  values  at  the  two  different  boundaries  do  not  mix. 

One  of  the  most  important  types  of  boundary  value  problems  in  physics  are  linear 
second  order  boundary  value  problems  with  decoupled  boundary  conditions.  They 
are  of  the  form: 

a2(x)y"(x)  +  a,  (x)y'(x)  +  a0(x)y(x)  =f(x)  ,  x  e[a,b\  ,  (8.4a) 

u0y(a)  +  a\ y'(a)  —  Ai,  |a0|  +  |«i|  #  0  ,  (8.4b) 

Po y(b)  +  Pi y\b)  =  A2,  \Pq\  +  \Pi |  ^  0  .  (8.4c) 

This  chapter  focuses  mainly  on  problems  of  this  kind. 

In  particular,  for  second  order  differential  equations,  boundary  conditions  of  the 
form 


y(a)  =  a  ,  y(b )  =  P  ,  (8.5) 

are  referred  to  as  boundary  conditions  of  the  first  kind  or  DlRlCHLET  boundary 
conditions.  On  the  other  hand,  boundary  conditions  of  the  form 

y\a)  =  a  ,  y'{b)  =  P  ,  (8.6) 

are  referred  to  as  boundary  conditions  of  the  second  kind  or  Neumann  boundary 
conditions  and  boundary  conditions  of  the  form  (8.4)  are  referred  to  as  boundary 
conditions  of  the  third  kind  or  STURM  boundary  conditions. 

We  note,  that  the  particular  case  of  decoupled  boundary  conditions  does  not 
include  problems  like 


y(a)  =  y(b)  ^  0  .  (8.7) 

We  encountered  such  a  condition  in  Sect.  7.3  where  we  introduced  boundary 
conditions  of  this  form  as  periodic  boundary  conditions. 

In  the  following  section  the  method  of  finite  differences  will  be  applied  to  solve 
boundary  value  problems  of  the  form  (8.4).  On  the  other  hand,  shooting  methods,  in 
particular  the  method  developed  by  Numerov  (see,  for  instance,  [7]  and  references 
therein),  will  be  the  topic  of  the  third  section. 


8.2  Finite  Difference  Approach 
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A  common  alternative  in  the  case  of  constant  coefficients  is  to  solve  the 
differential  equation  with  the  help  of  Fourier  transform  techniques.  A  brief 
introduction  to  the  numerical  implementation  of  the  Fourier  transform  is  given 
in  Appendix  D. 


8.2  Finite  Difference  Approach 

For  illustrative  purposes,  we  regard  a  boundary  value  problem  of  the  form  (8.4).  The 
extension  to  more  complex  problems  might  be  tedious  but  follows  the  same  line  of 
arguments.  We  discretize  the  interval  [a,  b]  according  to  the  recipe  introduced  in 
Chap.  2:  the  positions  v k  are  given  by  xk  =  a  +  (k  —  1  )h,  where  the  grid-spacing  h 
is  determined  via  the  maximum  number  of  grid-points  Aas/i  =  (b  —  a)/(N  —  1). 
Hence,  we  have  x\  —  a  and  xn  =  b.  Furthermore,  we  use  the  notation  y k  =  y(xk) 
for  all  k  —  1 , . . . ,  N.  It  will  be  used  for  all  functions  which  appear  in  Eqs.  (8.4). 

Let  us  now  employ  the  central  difference  derivative  (2.10c)  in  order  to  approxi¬ 
mate 


//  //  /  \ 
yk  =y  w  ^ 


yk+i  -  2yk  +  yk-i 

h 2 


for  k  =  2, . . . ,  N  —  1  and 


(8.8) 


/*  =  /(**) « 


yk+i  —  yk- 1 

2 h 


(8.9) 


The  boundary  points  x\  and  xn  will  be  treated  in  a  separate  step.  In  order  to 
abbreviate  the  notation  we  will  rewrite  the  differential  equation  (8.4)  without  the 
indices  as 


a(x)y"(x)  +  b(x)y'(x)  +  c(x)y(x)  =f(x)  . 


(8.10) 


Equations  (8.8)  and  (8.9)  are  then  applied  and  we  arrive  at  the  difference  equation 


Jfc+i  —  2y^  +  yk-\  yk+i—yk_i 
ak - - - h  bk - — - h  ckyk  =  fk  , 


h 2 


2  h 


(8.11) 


where  k  =  2, . . .  ,N  —  1.  Sorting  the yk  yields 


ak  by 


h 2  2  h 


yk- 1  +  (  ck  — 


2  ak\ 

h 2  ) 


.  bk 


yk  +  7^7  + 


h 2  2  h 


yk-\-\  fk  ? 


(8.12) 


and  this  equation  is  only  valid  for  k  —  2, . . . ,  N—  1  because  we  defined  N  grid-points 
within  the  interval  [a,  b\. 
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A  final  step  is  necessary  in  which  the  boundary  conditions  are  incorporated.  This 
will  then  enable  us  to  reduce  the  whole  problem  to  a  system  of  linear  equations. 
Decoupled  boundary  conditions  of  a  second  order  differential  equation  for  the  left- 
hand  boundary  (8.4b)  are  of  the  general  form: 

a0y(a)  +  a\y'(a)  —  Ai,  |a0|,  |«i|  7^  0  .  (8.13) 

In  analogy,  we  find  for  the  right-hand  boundary  (8.4c): 

Poy(b)  +  P\y'(b)  =  X2,  \Po\,  \Pi\  7^  0  .  (8.14) 

We  discretize  y' (a)  as 

/  f s  \  A2  JO  1 

V|  =  V  (a)  %  ,  (8.15) 

and  set  j i  =  y(a).  Note  that  the  function  value  jo  in  Eq.  (8.15)  is  unknown  since 
the  virtual  point  Vo  =  a  ~  h  is  not  within  our  interval  [a,b\.  Nevertheless,  we  use 
Eq.  (8.15)  in  Eq.  (8.13)  and  obtain: 


,  J2  jO  _ 

aoyi  +  ai  — —j —  — 

2  h 


(8.16) 


We  solve  now  Eq.  (8. 16)  for  yo  under  the  premise  that  oq  ^  0, 


Jo  =  J2  -  —  (Ai  -«0ji)  , 

Oil 


(8.17) 


rewrite  Eq.  (8.12)  for  k  —  1 


ax  b  i\  (  2ax\  ( ax  bx 

JO+Kl-TT  Jl+  TT  +  ^7  J2=/l» 


h 2  2  h 


h 2 


/*2  2/z 


(8.18) 


and  insert  (8.17)  into  (8.18): 


2a i  ,  a?o  /  2ai 


2a  i  Ai 

Jl  +  —  j2=/l“ 


/l2 


b\  — 


Oil 


2a  i 

“T 


(8.19) 


On  the  other  hand,  in  the  specific  case  of  aq  =  0  we  immediately  obtain  from 
Eq.  (8.16): 


Ji 


Ai 

OiQ  ’ 


(8.20) 
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The  same  strategy  can  be  applied  to  incorporate  the  right-hand  side  boundary 
condition,  Eq.  (8.14):  We  discretize  Eq.  (8.14)  by  introducing  the  function  value 
y^v+i  at  the  virtual  grid-point  1  =  N  +  h  outside  the  interval  [a,  b]  via: 

&>.  +  />.  =A2.  (8.21) 

2  h 

This  equation  is  solved  for  y^+i  under  the  premise  that  /3\  ^  0 

2  h 

}A+1  =  yN- 1  +  ~7T  (A 2  —  /3o)A)  >  (8.22) 

Pi 

and  insert  this  into  Eq.  (8.12)  for  k  —  N.  This  results  in: 


2aN 

iNyN~l  + 


Cn  — 


In  the  specific  case  /3\ 
from  Eq.  (8.14): 


2aN 
h 2 


yN  -  In 


(8.23) 

=  0,  the  value  yN  is  fixed  at  the  boundary  and  one  obtains 


A2 

•  (8.24) 

P  o 

All  these  manipulations  reduced  the  boundary  value  problem  to  a  system  of 
inhomogeneous  linear  equations,  namely  Eqs.  (8.12),  (8.19),  and  (8.23).  It  can  be 
written  as 


Ay  =  F  ,  (8.25) 

where  we  introduced  the  vector  y  =  (yi , y2, . . . ,  y^v)r,  the  vector  F 

h 
h 


/yv-i 


F  = 


9 


(8.26) 
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and  the  tridiagonal  matrix  A: 


/Bl  Ci  0  •••  0  \ 

A2  B2  C2  •  •  •  : 

0 

*  •  •  • 

:  ••  ••  ••  0 

An-i  BN-\  Cn-  1 

\  0  •  •  •  0  An  Bn  J 


Here  we  defined 


V 


h\ 

2  h) 

2aN 


k  =  2, . . . ,  N  —  1  , 
k  —  N  , 


c  1  - 


2a\ 
h 2 


2aN 


k  =  1  , 

k  —  2, . . .  Af  —  1  , 
k  —  N  , 


and,  finally, 


< 


V 


2a  1 
/t2 


=  1  , 

k  =  2, . . . , N  —  1  . 


(8.27) 


(8.28) 


(8.29) 


(8.30) 


The  remaining  task  is  now  to  solve  this  linear  system  of  equations  (8.25).  (A 
brief  introduction  to  the  numerical  treatment  of  linear  systems  of  equations  can 
be  found  in  Appendix  C.)  Very  effective  methods  exist  for  cases  where  the  matrix 
A  is  tridiagonal  [8]  as  it  is  the  case  here.  Although  we  discussed  the  method  of 
finite  differences  for  the  particular  case  of  a  second  order  differential  equation  with 
decoupled  boundary  conditions,  the  same  strategy  can  be  employed  to  derive  similar 
methods  for  higher  order  boundary  value  problems.  However,  these  methods  will,  in 
general,  be  more  complex.  Furthermore,  we  note  that  in  cases  where  =  /3\  =  0 
the  function  values  at  the  boundaries  y \  and  yN  are  fixed  and  the  corresponding 
system  of  linear  equations  reduces  to  ( N  —  2)  equations. 
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Let  us  briefly  investigate  the  differential  equation  which  corresponds  to  the 
problem  (8.4)  together  with  periodic  boundary  conditions  of  the  form  (8.7).  In  this 
case  we  have  to  consider  that 


yi  —  yN  ,  (8.31) 

and,  for  a  solution  to  exist,  we  have  necessarily 

a\  —  a n,  b\  =  b^,  and  c\  —  cn  .  (8.32) 

The  finite  difference  approximations  (8.8)  and  (8.9)  are  again  applied  to  derive 
Eqs.  (8.12)  for  k  —  2, . . .  ,N  —  1.  For  k  —  2  Eq.  (8.12)  becomes 


ai  b2\  f  2  a2\  fa2  b2 

^~2h)yi  +  \Cl~l^)yi+\^+lh]^-f2' 


(8.33) 


and  we  have  for  k  —  N  —  1 

&N- 1  bw-i  \  (  la^-i  \  ,  ( cln- i  bu- 1  .  _ 

yN-2  +  I  cn- i - —  )  yN-i  +  (  — — I — )  yN  —  In- i  • 


h 2  2  h  \  h 2 


Since  yi  =  y^  this  can  be  rewritten  as 


h 2  2/i 


(8.34) 


tfjv-i  b^-i  ^  ,  /  2a^_i  \  ( <2n-i  b^- 1  .  „ 

3A-2  +  Lv-i  —  — — —  yN-\  +  — n — I — tt-  y i  —  In- \  • 


h 2  2/*  7  V  /*2 


Finally,  Eq.  (8.12)  results  for  k  —  1  in 


h 2  2/z 


(8.35) 


ai  \  /  2ai  \  (a\  bx 

yN- 1  +  I  <4 - 7T"  I  34  +  I  Ti  +  XT  I  34  —  /l  , 


/i2  2A 


h 2 


/*2  2/z 


(8.36) 


where  we  identified  yo  =  y(*i  —  /*)  =  y(v^v  —  h)=  yN- i  .  All  this  results  in  a  closed 
system  of  N  —  1  equations,  which  is  of  the  form  (8.25) 


Ay  =  F  , 


(8.37) 


where  y=  yN-i ) 


F  = 


(  h  \ 

h 


(8.38) 


V//Y-I  / 
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and  the  (TV  —  1)  x  (N  —  1)  matrix  A  is  given  by 

/  fli  Cl  0  ••• 

A2  B2  C2 


0 


A  = 


\ 

0 


0 


0 


0  An~2  Bn- 2  Cn- 2 

\  Cn-  1  0  •  •  •  0  AN- 1  BN- 1  J 


Here,  we  defined 


,  .  ak  bk 

k  ~  *  h2  2 h 


k  —  1, . . . ,  N  —  1  , 


(8.39) 


(8.40) 


and 


1  ,...,N-  1 


k  —  l, ...  ,N  —  l  . 


(8.41) 


(8.42) 


In  contrast  to  the  matrix  (8.27)  the  matrix  (8.39)  is  not  tridiagonal  since  the 
matrix  elements  (A)i tN-i  and  (A)n-i,i  are  non-zero.  Nevertheless,  it  was  possible 
to  reduce  the  boundary  value  problem  to  a  system  of  linear  equations  which  can  be 
solved  iteratively. 


8.3  Shooting  Methods 

For  illustrative  purposes,  we  restrict  here  the  discussion  to  a  second  order  boundary 
value  problem  with  decoupled  boundary  conditions  of  the  form  (8.4).  The  essential 
idea  of  shooting  methods  is  to  treat  the  boundary  value  problem  as  an  initial  value 
problem.  The  resulting  equations  can  then  be  solved  with  the  help  of  methods 
discussed  in  Chap.  5.  Of  course,  such  an  approach  is  ill-defined  because  no  initial 
conditions  but  only  boundary  conditions  are  given.  The  trick  is,  that  one  modifies  the 
initial  conditions  iteratively  in  such  a  way  that  in  the  end  the  boundary  conditions 
are  fulfilled.  Let  us  put  this  train  of  thoughts  into  a  mathematical  form:  We  rewrite 
the  second  order  differential  equation  (8.4a)  as 


y"  =f(y,y',x) , 


(8.43) 


8.3  Shooting  Methods 


125 


which  can  be  reduced  to  a  set  of  first  order  differential  equations  as  was  demon¬ 
strated  in  Chap.  5.  We  note  that  Eq.  (8.43)  is  not  yet  well  posed  since  the  initial 
conditions  have  not  been  defined.  The  boundary  condition  on  the  left-hand  side 
reads: 


a0y(a)  +  ot\ y'(a)  =  X\  .  (8.44) 

We  now  assume  that  /  (a)  —  z,  where  z  is  some  number.  This  gives  the  well  posed 
initial  value  problem 


y"  =f(y,S,x) , 

<  y(a )  =  —  —  —  z  ,  (8.45) 

Oto  «o 

y\a)  =  z  , 

under  the  assumption  that  oto  ^  0.  The  solution  of  this  problem  will  be  written  as 
y(x;z)  in  order  to  indicate  its  dependence  on  the  particular  choice  y'(a )  =  z.  We 
remember,  that  the  boundary  condition  at  the  right-hand  boundary  is  defined  as: 

Po y(b)  +  Pi y'(b)  =  A 2  .  (8.46) 

Let  us  introduce  the  function: 

F(z)  =  Po y(b;  z)  +  P\y'(b;  z)  -  A2  .  (8.47) 

We  observe  that  the  solution  of  the  equation 

F(z)  =  0  ,  (8.48) 

gives  the  desired  solution  to  the  boundary  value  problem  (8.4),  because  in  this  case 
the  second  boundary  condition  (8.46)  is  fulfilled.  In  practice,  one  tries  several  values 
of  z  until  relation  (8.48)  is  fulfilled.  However,  from  a  numerical  point  of  view  this 
method  is  very  inefficient  since  usually  several  initial  value  problems  have  to  be 
solved  until  the  correct  value  of  z  is  found.  Nevertheless,  in  some  cases  shooting 
methods  proved  to  be  very  useful  [7] . 

For  instance,  shooting  methods  are  particularly  effective  if  a  solution  to  an 
eigenvalue  problem  of  the  form 

a(x)y"(x)  +  b(x)y’(x)  +  c(x)y(x)  =  X y(x)  ,  (8.49a) 

in  combination  with  homogeneous  boundary  conditions, 


a0y(a)  +  <*1  y'(a)  =  0  , 


(8.49b) 
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and 


Poy(b)  +  Pi y'(b)  =  0  ,  (8.49c) 

is  to  be  found.  We  note  that  Eq.  (8.49a)  has  the  trivial  solution  y(v)  =  0  for  all 
values  of  A.  However  a  non-trivial  solution  will  only  exist  for  particular  values  of  A. 
These  particular  values  will  be  indexed  by  Xn  and  are  referred  to  as  eigenvalues  of 
Eq.  (8.49a)  [9].  The  corresponding  functions  yn(x)  are  referred  to  as  eigenfunctions. 
We  note  that  the  differential  equation  (8.49a)  in  combination  with  the  boundary 
conditions  (8.49b)  and  (8.49c)  define  a  homogeneous  boundary  value  problem.  Such 
a  problem  is  commonly  referred  to  as  an  eigenvalue  problem  [9] .  Furthermore,  we 
note  the  following  property  of  homogeneous  boundary  value  problems:  Suppose 
that  y(v)  is  a  solution  of  the  boundary  value  problem  (8.49),  then  y(v)  =  yy(x),  with 
y  —  const  will  also  be  a  solution  of  (8.49).  Hence,  the  solution  of  a  homogeneous 
boundary  value  problem  is  not  unique  but  invariant  under  multiplication  by  a 
constant  y.  Typically,  the  multiplicative  factor  y  is  fixed  by  some  additional 
condition,  such  as  a  normalization  condition  of  the  form 

b 

dx\y(x)\2  =  1  .  (8.50) 

We  now  employ  this  property  and  choose  y{a)  —  1.  Inserting  this  choice 
into  (8.49b)  yields 


(8.51) 


Note  that  for  —  0  or  oq  =  0,  we  are  restricted  to  the  choices  y'(a)  —  0  and 
y(a )  is  arbitrary  or  y(a)  —  0  and  y\a)  is  arbitrary,  respectively.  If  we  assume  that 
a(x)  ^  0  for  all  x  E  [a,  b],  we  can  solve  the  initial  value  problem 


= 


<  y(a)  =  1  , 


c{x)  —  A 
a(x) 


(8.52) 


The  solutions  are  denoted  by  y(x;  A)  in  order  to  emphasize  that  they  will  highly 
depend  on  the  choice  of  the  parameter  A.  The  strategy  is  to  solve  the  initial  value 
problem  (8.52)  for  several  values  of  A  and  whenever  one  finds  that 


F( A„)  =  p0y(b;Xn )  +  0iy'(b;An )  -  An  =  0  , 


(8.53) 
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is  satisfied,  an  eigenvalue  Xn  with  the  corresponding  eigenfunction  yn(x)  —  y(x;  Xn) 
of  the  eigenvalue  problem  (8.49)  has  been  found. 

However,  this  strategy  is  also  very  time  consuming.  The  most  common  applica¬ 
tion  of  the  shooting  method  is  its  combination  with  a  very  fast  and  accurate  solution 
of  initial  value  problems.  This  method  is  known  as  the  Numerov  method  [7].  It  is 
applicable  whenever  one  is  confronted  with  a  differential  equation  of  the  form 

y" (v)  +  A(v)y(v)  =  0  ,  (8.54) 

in  combination  with  homogeneous  boundary  conditions.  Here  k(x)  is  some  function. 
If  we  are  particularly  interested  in  eigenvalue  problems  then  k(x)  has  the  form 
k(x)  —  q(x)  —  A,  where  q(x )  is  some  function  and  A  is  the  eigenvalue  [see  the 
discussion  after  Eq.  (8.49)].  For  instance,  consider  the  one-dimensional  stationary 
SCHRODINGER  equation  [10-12], 

f"  (x)  +  ^[E-  V(x)]  fix)  =  0  ,  (8.55) 

nz 

where  fix)  is  the  wave-function,  m  is  the  mass,  fi  denotes  the  reduced  PLANCK 
constant,  E  is  the  energy,  and  V(x)  is  some  potential.  In  this  case  we  identify 

k(x)  =  ^[E-  Vix)]  .  (8.56) 

nz 

We  note  that  Eq.  (8.55)  together  with  its  boundary  conditions  defines  an  eigenvalue 
problem  with  eigenvalues  En,  the  possible  energies  of  the  system.  We  remember 
from  Chap.  2,  Eq.  (2.34),  that 


yj+i  -  2yj  +  yj-i 

h 2 


h  (4) 

— y- 
I2yj 


(8.57) 


Here  we  made  use  of  Eq.  (8.54)  and  introduced  kj  =  kixj).  Furthermore,  we  write 
the  fourth  derivative  of  y(v)  at  point  v  =  xj  as 


..(4)  _  yj+ 1 _  2yj  +  yj- 1  _  -tj+iyj+i  +  2kjyj  -  fy-w- 1 
y*  h2  h2 


(8.58) 


where  we  employed  Eq.  (8.54).  Truncating  (8.57)  after  the  fourth  order  derivative 
yj4\  inserting  relation  (8.58),  and  solving  for  yy+  ]  yields 


yj+ 1 


2  (i  -  TT*j)  >7  -  ( 1  +  n%- 1)  yj- 1 

1  +  jjkj+i 


(8.59) 


This  gives  a  very  fast  algorithm  to  solve  the  differential  equation  (8.54)  with  some 
initial  values  of  the  form  (8.52).  The  remaining  strategy  is  the  same  as  discussed 
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above,  i.e.  one  screens  the  parameter  A  in  order  to  find  the  eigenvalues  \n  and 
eigenfunctions  yn(x).  In  case  of  the  Schrodinger  equation,  one  can  screen  the 
energy  E  in  order  to  obtain  the  energy  eigenvalues  En  which  satisfy  a  condition  of 
the  form  (8.53). 

Before  we  present  two  illustrating  examples  in  the  next  two  chapters  let  us 
conclude  this  section  with  two  important  remarks  on  the  Numerov  method.  We 
note  from  Eq.  (8.59)  that  in  order  to  compute  ^3  one  already  needs  the  function 
values  y\  and  y^.  Usually,  one  obtains  these  values  from  the  boundary  conditions 
in  combination  with  some  additional  condition  for  the  problem  at  hand.  Such  an 
additional  condition  might  be,  for  instance,  the  normalization  of  the  function  y(v), 
like  Eq.  (8.50).  We  also  emphasized  that  one  has  to  run  the  Numerov  algorithm 
several  times  for  different  trial  values  of  the  parameter  A.  In  order  to  reduce  the 
computational  cost  of  the  method  it  is  in  many  cases  advantageous  to  store  the 
function  values  qt ,  where  A/  =  qt  —  A,  in  an  array  which  is  then  regarded  as  an 
input  argument  of  the  Numerov  algorithm. 


Summary 

We  focused  on  linear  boundary  value  problems  defined  on  a  finite  interval  [a,  b]  C 
R.  Most  important  for  physics  are  second  order  boundary  value  problems  with 
decoupled  boundary  conditions,  i.e.  the  boundary  conditions  at  the  two  different 
boundaries  do  not  mix.  The  numerical  treatment  of  the  second  order  differential 
equation  together  with  its  boundary  conditions  concentrated  either  on  the  applica¬ 
tion  of  finite  differences  or  on  shooting  methods.  In  the  finite  difference  approach 
the  methods  developed  in  Chap.  2  were  applied  and  the  boundary  conditions  were 
incorporated  directly.  This  resulted  in  a  set  of  linear  algebraic  equations  which 
was  to  be  solved  for  each  grid-point  of  the  discretisized  interval  [a,  b\.  The  case 
of  periodic  boundary  conditions  was  also  discussed  in  detail. 

The  shooting  methods,  on  the  other  hand,  try  to  link  the  decoupled  boundary 
value  problem  to  an  initial  value  problem.  This  allowed  the  application  of  the 
methods  discussed  in  Chap.  5.  The  idea  was  to  start  with  some  initial  value  at  one 
of  the  two  boundaries,  solve  the  differential  equation  numerically  and  to  modify 
the  initial  value  iteratively  until  it  agreed  with  the  original  boundary  condition 
within  some  predefined  error.  Such  a  procedure  is  rather  time  consuming.  Never¬ 
theless,  shooting  methods,  in  particular  its  Numerov  variation,  proved  to  be  very 
useful  in  the  numerical  solution  of  eigenvalue  problems.  This  was  demonstrated 
using  the  homogeneous  boundary  value  problem  of  the  one-dimensional  stationary 
Schrodinger  equation  as  an  example. 
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Problems 

1.  Solve  the  linear  second  order  boundary  value  problem 

y"(x)  +  y(x)  =  x , 

for  v  G  [0,  tt/2]  with  the  boundary  conditions  y(0)  =  0  and  y(n/2)  —  1 
analytically  and  then  numerically  with  the  help  of  finite  differences. 

2.  Solve  the  linear,  second  order  boundary  value  problem 

y"(x)  —  2  cos  (2  v)y(v)  =  0, 

on  the  interval  [—tt/2,  tt/2]  and  with  the  boundary  conditions  y(d=7r/2)  =  1. 
Use  the  finite  difference  method. 

Comment:  The  solution  can  be  expressed  analytically  in  terms  of  so-called 
Mathieu  functions  [13,  14]  which  might  be  intrinsically  available  from  your 
computing  environment.  If  this  happens  to  be  the  case,  it  might  be  a  good  idea  to 
compare  the  numerical  solution  with  the  analytical  result. 
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Chapter  9 

The  One-Dimensional  Stationary  Heat  Equation 


9.1  Introduction 

This  is  the  first  of  two  chapters  which  illustrate  the  applicability  of  the  methods 
introduced  in  Chap.  8.  Within  this  chapter  the  finite  difference  approach  is  employed 
to  solve  the  stationary  heat  equation.  Let  us  motivate  briefly  this  particular  problem. 
We  consider  a  rod  of  length  L  which  is  supposed  to  be  kept  at  constant  temperatures 
T0  and  TN  at  its  ends  as  illustrated  in  Fig.  9.1.  The  homogeneous  heat  equation  is  a 
linear  partial  differential  equation  of  the  form 

d 

—  T(x,t)  =  kAT{xJ)  .  (9.1) 

ot 

Here  T (x,  t)  is  the  temperature  as  a  function  of  space  v  £  M3  and  time  t  e  M, 
A  =  V2  =  d2x  +  d2y  +  dl  is  the  Laplace  operator,  and  k  —  const  is  the  thermal 
diffusivity. 

We  remark,  that  Eq.  (9.1)  is  a  partial  differential  equation  together  with  initial 
and  boundary  conditions.  Moreover,  we  note  in  passing  that  the  heat  equation  is 
equivalent  to  the  diffusion  equation  [1] 

d 

—p(x,  t )  =  DAp(x,  t)  ,  (9.2) 

ot 

with  particle  density  p(x,  t)  and  the  diffusion  coefficient  D  —  const.  Here  we  restrict 
ourselves  to  a  simplified  situation  in  order  to  test  the  validity  of  the  finite  difference 
approach  discussed  in  Sect.  8.2.  The  general  solution  of  the  heat  or  diffusion 
equation  will  be  discussed  in  Sect.  11.3.  (The  problem  of  the  one-dimensional  heat 
equation  was  studied  in  all  conceivable  detail  by  J.  R.  Cannon  [2].) 

If  we  assume  that  the  cylindrical  surface  of  the  rod  is  perfectly  isolated,  we  can 
restrict  the  problem  to  a  one-dimensional  problem.  Furthermore,  we  assume  that  the 
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Fig.  9.1  We  consider  a  rod 
of  length  L.  Its  ends  are  kept 
at  constant  temperatures  To 
and  Tn,  respectively 


<- 


L 


steady-state  has  been  reached,  i.e.  ftT(x,  t)  —  0.  Hence,  the  remaining  boundary 
value  problem  is  of  the  form 


—  r(x)=0,  x  <5  [0,  L]  , 

7X0)  =  To  ,  (9'3) 

T(L)  =  Tn  . 

The  solution  can  easily  be  found  analytically  and  one  obtains 

T(x)  =  T0  +  (Tn  -  T0)  ±  .  (9.4) 

In  the  following  section  we  will  apply  the  approach  of  finite  differences  to  the 
boundary  value  problem  (9.3)  as  discussed  in  Sect.  8.2. 


9.2  Finite  Differences 


We  discretize  the  interval  [0,  L\  according  Chap.  2  by  the  introduction  of  N  +  1  grid- 
points  xn  —  nh ,  with  h  —  L/N,  xq  —  0,  and  —  L.  Furthermore,  Tn  =  T(xn)  and, 
in  particular,  we  refer  to  the  boundary  conditions  (9.3)  as  To  and  TN,  respectively. 
On  the  basis  of  this  discretization,  we  approximate  Eq.  (9.3)  by 


Tn+\  —  2  Tn  +  Tn- 1 
h1 


(9.5) 


or  equivalently 


Tn+ 1  —  2 Tn  +  Tn- 1  —  0  . 


(9.6) 


We  can  rewrite  this  as  a  matrix  equation, 


AT  —  F  , 


(9.7) 
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where  the  boundary  conditions  have  already  been  included.  In  Eq.  (9.7)  the  vector 
T  =  (7) ,  72, ... ,  TN-\)t .  Furthermore,  the  tridiagonal  matrix  A  is  given  by 


/  -2  1  0  . . .  0  \ 
1-2  1  0  ...  0 

0  1-21 


\0...  1  -2/ 


and  the  vector  F  by 


(9.8) 


T«\ 

0 


0 


(9.9) 


It  is  an  easy  task  to  solve  Eq.  (9.7)  analytically.  It  follows  from  Eq.  (9.6)  that 

77+ 1  =  2 Tn  —  Tn- 1,  ft  =  1 , . . . ,  TV  —  1  .  (9.10) 

We  insert  ft  =  1 , 2,  3  in  order  to  obtain 


=  27)  -  r0 , 

(9.11) 

-  2r2  -  r, , 

=  sr,  -  2r0 , 

(9.12) 

=  2r3  -  r2 , 

=  47)  -  3r0 . 

(9.13) 

We  recognize  the  pattern  and  conclude  that  Tn  has  the  general  form 

Tn=nTl-(n-l)T0,  (9.14) 

which  we  prove  by  complete  induction: 

77+1  =  277  —  T7-i 

=  2 [nTx  -{n-  l)T0]  -  [(n  -  1  )TX  -  (n  -  2 )T0] 

=  (n  +  l)T\  -  . 


(9.15) 
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Hence,  expression  (9.14)  is  valid  for  all  n  —  l, ...  ,N.  However,  since  TV  is  kept 
constant  according  to  the  boundary  condition,  we  can  determine  T\  from 


Tn  =  NT]  -  NT0  +  T0  , 


which  yields 


TV  —  To 


+  To  . 


Inserting  (9.17)  into  (9.14)  gives 


Tn  —  To  +  (TV  —  To)  — 


—  To  +  (TV  — 


(9.16) 


(9.17) 


(9.18) 


which  is  exactly  the  discretized  version  of  Eq.  (9.4).  Hence  the  finite  difference 
approach  to  the  boundary  value  problem  (9.3)  is  exact  and  independent  of  the 
grid- spacing  h.  This  is  not  surprising  since  we  proved  already  in  Chap.  2  that  finite 
difference  derivatives  are  exact  for  linear  functions. 


9.3  A  Second  Scenario 


We  consider  the  inhomogeneous  heat  equation 


d 

—  T(x,  t )  =  kAT(x,  t)  —  r(x,  t) 
ot 


(9.19) 


Here  r(x,  t)  =  r(x)  is  some  heat  source  or  heat  drain,  which  is  assumed  to  be 
independent  of  time  t.  Again,  we  consider  the  one  dimensional,  stationary  case,  i.e. 


d2  1 

— r(x)  =  -r(x) , 

d.xz  k 


(9.20) 


with  the  same  boundary  conditions  as  in  Eq.  (9.4).  Furthermore,  we  assume  r(x)  to 
be  of  the  form 


r  (a)  =  —  exp 


h-§): 

i1 


(9.21) 


i.e.  r(x)  has  the  form  of  a  Gauss  peak  which  is  centered  at  v  =  L/2  and  has  a  width 
determined  by  the  parameter  t  and  a  maximum  height  given  by  the  constant  0.  Such 
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a  situation  might  occur,  for  instance,  when  the  rod  is  heated  with  some  kind  of  a  heat 
gun  or  cooled  by  a  cold  spot.  (In  cases  where  the  diffusion  equation  (9.2)  is  used 
to  describe  the  random  motion  of  electrons  in  some  device,  one  can  imagine,  that 
the  density  of  electrons  p  is  constant  at  the  contacts  at  v  =  0  and  x  —  L.  The 
source/drain  term  r(x)  then  accounts  for  a  constant  generation  or  recombination 
rate  of  electrons,  for  instance,  through  incoming  light  or  intrinsic  traps,  respectively 
[3].) 

Furthermore,  we  note  that  in  the  limiting  case  t  — >  0  we  have 


lim  T(x)  o<  @8 
£->0 


(9.22) 


where  £(•)  is  the  Dirac  8 -distribution;  in  this  case  the  spatial  extension  of  the 
source/drain  term  r(x)  is  infinitesimal. 

We  now  employ  the  results  of  Sect.  8.2  and  rewrite  the  system  of  equations  in  the 
familiar  form1 


AT  =  F  ,  (9.23) 

where  A  has  already  been  defined  in  Eq.  (9.8),  T  —  (73,  73, . . . ,  TN-\)t ,  and  F  is 
given  by 


K 


K  rji 

-  jplo 


Fn-2 


\ 


(9.24) 


Here  we  used  the  notation  Tn  =  r(xn). 

The  system  is  solved  numerically  quite  easily  using  methods  discussed  by  PRESS 
et  al.  [4]  for  the  solution  of  sets  of  algebraic  equations  of  the  kind  (9.24)  with 
tridiagonal  matrix  A.  We  chose  L  —  10,  k  =  1,0  =  —0.4,  7=1,  To  =  0 
and  Tn  —  2.  The  resulting  temperature  profiles  T(x)  (solid  line)  for  different  values 
of  N  can  be  found  in  Figs.  9.2,  9.3,  and  9.4  as  well  as  the  respective  form  of  the 
function  T(x)  (dashed  line).  With  increasing  number  of  steps  we  see,  as  it  was  to  be 
expected,  a  refinement  of  the  temperature  profile.  Its  maximum  does  not  quite  agree 
with  the  minimum  of  r(x ),  it  is  shifted  slightly  towards  the  end  of  the  rod  because 
of  the  boundary  conditions,  i.e.  7q  <  TN. 


1 


We  note  that  Eq.  (9.20)  can  also  be  solved  with  the  help  of  Fourier  transforms,  see  Appendix  D. 
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Fig.  9.2  Temperature  profile 
T(x )  ( solid  line,  left  hand 
scale )  and  the  source  function 
r(x)  (< dashed  line,  right  hand 
scale )  forN  =  5 


Fig.  9.3  Temperature  profile 
T(x)  ( solid  line,  left  hand 
scale )  and  the  source  function 
r(x)  (< dashed  line,  right  hand 
scale )  forN=  10 


U 


Fig.  9.4  Temperature  profile 
T(x )  (solid  line,  left  hand 
scale )  and  the  source  function 
r(x)  (dashed  line,  right  hand 
scale )  for  N  =  100 


U 
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Summary 

The  methods  of  Sect.  8.2  were  applied  to  find  the  numerical  solution  of  the 
stationary  heat  equation  with  Dirichlet  boundary  conditions.  We  studied  the 
particular  case  of  an  isolated  rod  of  length  L.  This  reduced  the  dimensionality  of  the 
differential  equation  to  one.  The  length  of  the  rod  was  then  divided  into  N  discrete 
grid-points.  Using  finite  differences  the  one-dimensional  ordinary  second  order 
differential  equation  which  described  this  particular  problem  was  transformed  into 
a  set  of  linear  algebraic  equations  which  determined  the  temperatures  at  each  grid- 
point.  This  set  of  algebraic  equations  was  characterized  by  a  tridiagonal  coefficient 
matrix.  Solutions  have  been  studied  without  and  with  a  heat  source  which  was 
described  as  a  ‘point’  source  characterized  by  a  Gaussian  of  given  width  and 
amplitude.  In  the  first  case  analytic  solutions  were  easily  derived.  They  described  a 
linear  temperature  profile  increasing  (decreasing)  from  To  to  TN.  In  the  latter  case 
solutions  were  generated  numerically  using  specific  algorithms  designed  for  sets  of 
algebraic  equations  with  a  tridiagonal  coefficient  matrix  A. 


Problems 

1.  Calculate  the  stationary  temperature  profile  across  the  cylindrical  rod  of  Fig.  9.1 
which  is  exposed  to  a  heat  sink  centered  around  x  =  L/2.  This  heat  sink  is 
described  by  a  function  T(x)  which  is  of  rectangular  shape  of  width  a  and  depth 
0.  Both  ends  of  the  rod  are  kept  at  constant  temperatures  Tq  and  TN,  respectively. 

2.  Investigate  the  three  cases  Tq  >  Tn,  Tq  <  Tn,  Tq  =  TN  >  0,  and  study  the 
influence  of  the  width  a  of  the  heat  sink  on  the  temperature  profile. 

3.  Consider  the  one-dimensional  drift-diffusion  equation  [5] 

d  dd2 

—  p(x,  t)  =  —D\  —  p{x,  t)  +  D2— -zp(x,  t)  +  T(x,  t) 
ot  ox  cur 

where  D\  is  the  drift  constant  and  D2  the  diffusion  constant.  For  instance,  if 
we  want  to  model  the  electron  density  in  an  electronic  device,  the  drift  constant 
would  be  in  the  simplest  case  D\  —  fiE ,  where  /z  is  the  charge  carrier  mobility 
and  E  is  the  x-component  of  the  electric  field. 

Discretize  the  above  equation  for  the  stationary  case  and  solve  it  numerically 
for  different  values  of  D\.  The  boundary  conditions  are  p(0)  >  0  and  p(L )  > 
p(0),  where  L  denotes  the  length  of  the  device.  Solve  it  numerically  and  analyt¬ 
ically  for  r(x)  —  0  and  compare  the  results.  Investigate  also  a  scenario  with  a 
generation  rate  r  given  by  an  exponential  function  like  T(x)  —  Tq  exp  (—Ax) 
with  A  >  0. 
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Chapter  10 

The  One-Dimensional  Stationary 
Schrodinger  Equation 


10.1  Introduction 


The  numerical  solution  of  the  stationary  SCHRODINGER  equation  is  discussed  to 
illustrate  the  application  of  Numerov’s  shooting  method  introduced  in  Sect.  8.3. 

We  start  the  discussion  with  a  brief  survey  of  basic  quantum  mechanics.  Of 
course,  this  chapter  is  not  supposed  to  give  a  self-contained  introduction  into  this 
field  and  the  reader  not  familiar  with  quantum  mechanics  should,  therefore,  regard 
the  following  discussion  from  a  purely  mathematical  point  of  view.  For  more  in- 
depth  reading  on  quantum  mechanics  we  refer  to  the  books  [1-4]  to  name  a  few. 

A  quantum-mechanical  wave-function  ^  ^  (v,  t)  e  C  is  a  function  of  time 

t  e  M+  and  space  v  e  M3  and  obeys  the  Schrodinger  equation: 

.  d 

iti  —  V  =  HV.  (10.1) 

d t 

Here,  fi  —  h/(2n)  is  the  reduced  Planck  constant,  i  is  the  imaginary  unit,  and 
H  is  the  Hamilton  operator  or  Hamiltonian.  If  H  ^  H(t),  i.e.  the  Hamiltonian  is 
independent  of  time  t,  we  can  employ  a  product  ansatz 


&(x,  t )  =  exp 


t(x)  , 


(10.2) 
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where  E  is  the  energy  and  f(x)  is  the  time-independent  part  of  the  wave-function. 
This  ansatz  transforms  Eq.  (10.1)  into 


d  t 


exp  (  —  ~Et )  t jr(x) 


ifi  (  —  — £  j  exP  yjEt )  V7 M 


exp  |  — — Et^  H\fr(x )  , 


(10.3) 


and  f(x)  is  determined  by  the  eigenvalue  problem  [5] 


Hf  =  Ef  ,  (10.4) 

augmented  by  appropriate  boundary  conditions.  We  already  came  across  Eq.  (10.4) 
when  we  discussed  shooting  methods  in  Sect.  8.3. 

The  one-particle  Hamiltonian  is  of  the  general  form 

P 2 

//  =  r  + v=  —  +  v,  (10.5) 

2m 

with  the  kinetic  energy  operator  T,  the  potential  operator  V,  the  momentum  operator 
P,  and  the  particle’s  mass  m.  If  the  system  is  not  exposed  to  an  external  magnetic 
field,  P  can  be  expressed  in  position  space  by 

P  =  -itiV  ,  (10.6) 


and  the  potential  operator  V  by  V(v).  Thus  we  get  for  Eq.  (10.5): 

fi2 

H= - A  +  V(x).  (10.7) 

2m 

Hence,  we  have  to  solve  the  linear,  second  order  partial  differential  equation: 

fi 2 

- A\jr{x)  +  V{x)\j/{x)  —  E\jf(x)  .  (10.8) 

2m 

This  equation  will  certainly  not  have  solutions  for  arbitrary  values  of  the 
energy  E.  The  particular  values  E  —  Enl  for  which  it  has  a  solution  are  referred  to  as 
eigenenergies  and  the  corresponding  solution  \[rn  (x)  is  referred  to  as  eigenfunction 


'it  depends  on  the  problem  on  hand  whether  or  not  the  index  n  will  be  continuous  or  discrete.  For 
simplicity,  we  assume  here  n  to  be  discrete. 
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to  the  eigenenergy  En  [1-3,  5].  To  emphasize  this  point  we  rewrite  Eq.  (10.8)  as: 
fi2 

-  —A\/rn(x)  +  V(x)fn(x)  =  Enfnix)  ,  Wn  e  N  .  (10.9) 

2m 

It  is  the  purpose  of  this  chapter  to  develop  a  numerical  procedure  which  will,  in  the 
end,  allow  to  calculate  numerically  the  eigenvalues  En  and  eigenfunctions  \l/n  (v)  as 
solutions  of  this  equation. 

We  proceed  in  our  analysis  by  defining  the  scalar  product  between  two  functions 
X(x)  and  <p(x) 


(10.10) 


where  x*(x)  denotes  the  complex  conjugate  of  /(x).  The  corresponding  L2-norm 
reads 


lil  =  Ex\x) 


(10.11) 


The  expectation  value  of  an  operator  O  in  the  quantum  mechanical  state  &  is  given 
by 


{V\0\V)  _  f  d x'I'*(x)0'I'(x) 
~  f  dx\>P(x)\2 


(10.12) 


It  follows  from  Eq.  (10.4)  that  the  energy  is  the  expectation  value  of  the  Hamilto- 
nian  H 


f  dxlE*(x)EEI/(x) 
f  dv|t//(v)|2 


(10.13) 


We  quote  now  some  important  properties;  a  detailed  discussion  can  be  found  in 
any  textbook  on  quantum  mechanics. 

•  The  expectation  value  (O)  of  a  Hermitian  operator  O ,  O'  =  O,  is  real,  i.e.  (O)  = 
(O)*.  Here  O^  denotes  the  adjoint  of  O,  i.e.  O^  =  (0*)r. 

•  Every  real  expectation  value  can  be  described  by  a  Hermitian  operator. 

•  All  observables  can  be  described  by  Hermitian  operators,  in  particular,  the 
Hamiltonian  has  to  be  a  Hermitian  operator  to  ensure  that  the  eigenenergies  En 
are  real,  En  e  M. 
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•  It  follows  from  the  hermiticity  of  H  that  the  eigenfunctions  i /rn(x)  form  a 
complete,  orthogonal  basis  in  Hilbert  space  [5].  Furthermore,  the  functions 
can  be  normalized  and  the  relation 

{^n\fm)=Snm,  (10.14) 

holds,  with  8nm  the  Kronecker-#. 

Thus,  with  the  help  of  Eq.  (10.14)  we  rewrite  the  expectation  value  (10.12)  of  a 
Hermitian  operator  O  as: 


dx^(x)0^(x)  . 


(10.15) 


In  a  next  step  we  define  the  wave  function  ^(x,  t)  following  the  ansatz  (10.2) 


^n(x,  t)  =  exp 


fn(x)  , 


(10.16) 


and  the  total  wave-function  W  (x,  t)  is  then  a  superposition  of  wave-functions  &n  (x,  t) 

V(x,t)  =  Y^n(x,t)  ,  (10.17) 

n 


because  the  &n(x,t)  constitute  a  complete,  orthogonal  basis.  Furthermore,  we 
demand  ^(x,t)  to  be  normalized  for  all  t.  Employing  Eq.  (10.14)  in  Eq.  (10.17) 
yields 


J  dx|^(x,r)|2 


(10.18) 


We  quote  Born’s  interpretation  of  the  squared  modulus  of  the  total  wave- 
function  (referred  to  as  Born’s  rule): 

\\P(x,  t)\  dx  =  The  probability  that  the  particle  described  by 

the  wave-function  &( x ,  t)  can  be  found  at  time 
t  within  a  volume  dx  around  the  point  v.  (10. 19) 


2This  is  only  possible  because  the  Schrodinger  equation  is  linear. 
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This  interpretation  justifies  the  requirement  of  a  normalization  of  the  wave-function 
&(x,  t) 


j  dx\^(x,t)\2  =  1  ,  (10.20) 

because,  by  definition,  the  particle  has  to  be  found  somewhere  anytime. 

Suppose  we  start  with  an  initial  state  x(x)  —  &(x,t  =  0).  Since  the  functions 
i /rn(x)  form  a  complete  basis  in  Hilbert  space,  x(x)  may  be  written  with  the  help 
of  Eq.  (10.17)  as 


X(x)  =  ^cnfn{x) 


n 


(10.21) 


We  deduce  from  Eq.  (10.14)  that 


(fm\x)  = 


-  Hcn  J 

n 


d xf*(x)fn(x)  =  Cm 


(10.22) 


Consequently,  \cm\2  is  the  probability  that  the  particle  was  initially  in  state  m. 
This  allows  us  to  interpret  Eq.  (10.17)  in  the  following  way:  The  coefficients  cm 
determine  the  composition  of  the  initial  state.  The  exponential  factor  describes  an 
oscillation  and  we  note  that  different  eigenfunctions,  which  correspond  to  different 
eigenenergies,  oscillate  with  different  frequencies.  This  can,  for  instance,  induce  the 
diffluence  of  a  wave  packet. 


10.2  A  Simple  Example:  The  Particle  in  a  Box 

We  concentrate  on  the  one-dimensional  Schrodinger  equation  and  discuss  a 
simple  problem  which  will  then  be  solved  numerically  in  Sect.  10.3.  We  rewrite 
the  one-dimensional  SCHRODINGER  equation  (10.7),  with  v  e  R,  as 

fi 2  d2 

-  --—J^n(x)  +  V{x)fn{x)  =  Enfn{x)  ,  (10.23) 

2m  axz 


and  specify 


0  0  <  x  <  L, 

elsewhere, 


V(x)  = 


oo 


(10.24) 
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together  with  the  boundary  conditions 

tyn (0)  =  fn{L)  =  0  , 


(10.25) 


and  the  normalization  condition 

J  dx\f„(x)\2  =  J  dx|i/f„(x)|2  =  1  .  (10.26) 

We  note  that  the  boundary  conditions  are  dictated  by  the  particular  form  of  the 
potential  (10.24)  which  requires  that  ^n{x)  —  0  for  v  [0,L].  This  problem  is 
commonly  referred  to  as  the  particle  in  a  one -dimensional  box. 

Let  us  introduce  dimensionless  variables  in  order  to  simplify  the  numerics  of 
Eq.  (10.23).  We  define  new  variables 


.v  = 


(10.27) 


where  L  is  the  length  scale  and  E  is  the  energy  scale.  The  energy  scale  E  is  fully 
determined  by  the  relation 


-  fi2 
E  = 


mL2 


(10.28) 


We  note  that  s  e  [0, 1],  hence  the  rescaled  wave-function  is  given  by 

<Pn(s)  =  Chfn(x)  , 


(10.29) 


which  satisfies  the  normalization  condition 


[  dx\i/n(x)\2  =  [  ds|<p„(s)|2  =  1  . 
Jo  Jo 


(10.30) 


The  rescaled  Schrodinger  equation  can  be  obtained  by  multiplying  Eq.  (10.23) 
with  1  /E: 


h2  d2  /  ,  V(x)  /  x 

+  -=rfn(x) 

2  mE  d x2  E 


L2  d2 


2  dx2 

1  d2 


V rn(x)  +  v(s)f„(x) 


2  d.v2 


%(x)  +  v{s)fn{x) 


=  -=tyn(x) 
E 


(10.31) 


10.2  A  Simple  Example:  The  Particle  in  a  Box 


145 


Here  we  introduced  the  rescaled  potential  v  (s) 


0<s<l, 

elsewhere. 


(10.32) 


Hence,  the  rescaled  wave-function  (10.29)  is  a  solution  of  the  differential  equation: 


1  d2 

2d? 


<Pn(s)  +  v(s)<pn(s)  =  en<Pn(s )  . 


(10.33) 


The  form  (10.32)  of  the  potential  implies  that  (pn(s)  —  0  for  all  s  £  [0,  1]  and  the 
complete  boundary  value  problem  is  defined  as: 


* 


< 


V 


1  d2 

2  ds2 


<Pn(s)  =  Sncpn(s)  , 


<Pn  (0)  =  0  , 
<PnQ)  =  0  • 


S  E  [0,  1]  , 


(10.34) 


It  is  an  easy  task  to  solve  this  boundary  value  problem  analytically.  For  s  e  [0,1] 
we  choose  the  ansatz 


cpn(s)  =  An  sin (kns)  +  Bn  cos (kns)  ,  (10.35) 

where  An  and  Bn  are  some  constants  and  kn  is  given  by 

kn  —  \J2 sn  .  (10.36) 

From  the  boundary  conditions  we  obtain 

(pn(0)=Bn  =  0,  (10.37) 


and 


<pn(X)  =  An  sin(k„)  =  0  . 


(10.38) 
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Thus, 


kyi  —  nix  , 


(10.39) 


and  the  eigenenergies  sn  are  quantized: 


2  2 

n  7i 


&n  — 


(10.40) 


The  corresponding  eigenfunctions  (pn{s)  are  then  given  by: 


(pn(s)  = 


An  sin {nns) 
0 


s  £  [0, 1]  , 

elsewhere. 


(10.41) 


The  constants  An  are  determined  from  the  normalization  condition  (10.30): 


[  ds\(pn(s)\2  =  A2n  [ 

Jo  Jo 


r\ 

dssin  (ruts) 


A 


n 


2 

i 

=  1  , 


(10.43) 


and: 


An  —  \fl. 


(10.44) 


Finally,  we  apply  the  relations  (10.27),  (10.28),  and  (10.29)  and  obtain 


■w = i?-  (i) = 


2  .  (Ylltx\ 

I sin  \~l~ ) 


0 


X  e  [0,  L]  , 
elsewhere, 


(10.45) 


3  Here  we  make  use  of 


/ 


d u  sin2(n)  =  -  [u  —  cos (u)  sin(n)] 


(10.42) 
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and 


En  —  SriE  — 


—  fi2n2n2 


nJ 


2mL 2 


(10.46) 


In  most  cases  expectation  values  of  some  observables  are  to  be  determined.  We 
might,  for  instance,  be  interested  in  the  expectation  value  (x)  of  the  position  operator 
v  or  its  variance  var(x)  =  ((jc  —  (x))2)  (see  also  Appendix  E).  It  follows  from 
Eq.  (10.27)  that 


(x)  =  L(s) ,  and  ((x  —  (x))2)  =  L 2  ((s  —  ( s })2)  =  L2var  (s)  .  (10.47) 


Definition  (10.15)  gives  together  with  solution  (10.41)  the  expectation  value  (s): 


r\ 

dssin  (titc s)s 


1 

2  ' 


(10.48) 


Thus,  the  expectation  value  of  the  position  operator  is  independent  of  the  quantum 
number  n.  Furthermore,  we  obtain  for  (s2): 


(, s 2)  =  2  /  ds  sin2(n7ts)s2 

Jo 


1  1 

3  2n2n2 


Hence,  the  variance  var  (s)  is  determined  by 


[0  -  (s))2)  =  {s2)  -  (S)2 


1  1  1 

3  2n2Tt2  4 


12  \  n2it2^) 


(10.49) 


(10.50) 


We  note  that  the  variance  increases  with  increasing  n. 

In  the  next  section  these  results  are  reproduced  numerically  by  the  Numerov 
shooting  method.  (See  Sect.  8.3  and,  for  instance,  Ref.  [6].)  Moreover,  this  numeri¬ 
cal  formulation  will  allow  us  to  find  solutions  for  more  complex  potentials  V(x). 


10.3  Numerical  Solution 

The  following  discussion  is  based  on  the  scaled  SCHRODINGER  equation  (10.33), 
but  we  consider  now  a  more  general  potential  of  the  form 

(  v(s)  0  <  s  <  l  , 

v(s)  =  {  (10.51) 

(  oo  elsewhere, 


148 


10  The  One-Dimensional  Stationary  Schrodinger  Equation 


which  results  in  the  boundary  value  problem: 


< 


V 


1  d2 

2  ds2 


<Pn(s)  +  v(s)cpn(s)  =  £n(pn(s ) 


<Pn( 0)  =  0  , 
<Pn(  1)  =  0  . 


s  £  [0, 1], 


n  e  N  , 


(10.52) 


As  our  numerical  treatment  will  be  based  on  shooting  methods,  discussed  in 
Sect.  8.3,  the  second  order  differential  equation  in  Eq.  (10.52)  will  be  transformed 
into  a  form  which  corresponds  to  Eq.  (8.54),  namely: 


<Pn(s)  +2  [en-  5(s)]  <pn{s)  =  0  .  (10.53) 


The  interval  [0,1]  is  discretized  using  N  +  1  grid-points  st  —  l IN,  l  — 
0, 1, 2 , ,N  (h  —  l/N )  and  we  denote  with  <pn(si)  and  v(si)  =  vi  the  values 
of  cpn(s )  and  v(s)  at  those  grid-points.  Thus,  Eq.  (8.59)  can  immediately  be  applied 
and  we  get: 


<Pn  0m)  = 


2  [l  ~  6^  (£«  “  ^)]  Vnfa)  “  [!  +  6 h(£n~  5*-l)]  Vnfa- 0 


1  +  6^2  (£n  ~  Vi+\) 


(10.54) 


We  use  the  initial  conditions  (pn(so)  =  0  and  (p'n(so)  =  1,  which  is  always 
possible,  since  (10.52)  is  a  homogeneous  boundary  value  problem.  This  gives 


<Pn(so)  * 


<Pn(s l)  -  <p„(s- i) 

2  h 


<Pn(S\)  = 


2 

A  ' 


(10.55) 


The  normalization  of  the  wave-function  (10.30)  is  then  approximated  with  the  help 
of  the  forward  rectangular  rule  (3.9): 


N 

»  Wnisi)]1 
1=0 


Consistently,  we  approximate  the  expectation  value  (s)  via 


N 

%  h  E  [<Pn(se)]2  Se  ■ 
1=0 


(10.56) 


(10.57) 
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Table  10.1  Comparison 
between  analytic  and 
numerical  eigenenergies  for 
the  particle  in  a  box  for 


N  =  100 


n 

sn -analytic 

sn  -numeric 

1 

4.934802 

4.934802 

2 

19.739209 

19.739208 

3 

44.413219 

44.413205 

4 

78.956835 

78.956753 

5 

123.370055 

123.369742 

6 

177.652879 

177.651943 

7 

241.805308 

241.802947 

8 

315.827341 

315.822077 

9 

399.718978 

399.708300 

10 

493.480220 

493.460113 

The  Numerov  shooting  algorithm  is  then  defined  by  the  following  steps: 

1.  Choose  two  trial  energies  sa  and  Sb  and  define  the  required  accuracy  r). 

2.  Calculate  (p{s^\  sa)  =  Va  and  (p{s^\  Sb)  =  (pb  using  Eq.  (10.54). 

3.  If  ipaipb  >  0,  choose  new  values  for  sa  or  Sb  and  go  to  step  1. 

4.  If  (pa<pb  <  0,  calculate  sc  —  (sa  +  Sb)  /2  and  determine  <p(sn',  sc)  =  <Pc  using 
Eq.  (10.54). 

5.  If  ipacpc  <  0,  set  Sb  —  sc  and  go  to  step  4. 

6.  If  cpc(pb  <  0,  set  sa  =  sc  and  go  to  step  4. 

7.  Terminate  the  loop  when  \sa  —  Sb\  <  7- 

These  steps  have  been  carried  out  for  100  grid-points,  a  potential  v  =  0,  and  a 

required  accuracy  of  rj  —  10-10.  The  first  ten  eigenenergies  are  given  in  Table  10.1 

and  are  compared  with  analytic  results  (10.40). 

In  addition,  Fig.  10.1  presents  the  first  five  eigenvalues  sn  (right  hand  scale)  as 
horizontal  straight  lines.  Aligned  with  these  eigenvalues  we  find  on  the  right  hand 
side  of  this  figure  the  corresponding  normalized  eigenfunctions  calculated  using 
N  —  100  grid-points.  The  agreement  with  the  analytic  result  of  Eq.  (10.41)  is 
excellent. 
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Fig.  10.1  The  first  five  numerically  determined  eigenvalues  sn  of  Table  10.1  are  presented 
as  horizontal  lines  ( left  hand  scale).  Aligned  with  these  eigenvalues  are  the  corresponding 
eigenfunctions  <pn(s)  vs  s  for  N  =  100  ( right  hand  scales ) 
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10.4  Another  Case 

Here  we  discuss  briefly  some  results  achieved  with  the  help  of  Numerov’s  shooting 
algorithm.  In  particular,  we  discuss  the  particle  in  the  box  for  three  different 
potentials  v(s) 


v\ (a)  =  50cos(tta)  ,  e2(a)  =  50  exp 


0.08 


vi(s)  =  50a  . 


(10.58) 

The  potentials  are  illustrated  in  Fig.  10.2.  All  calculations  were  carried  out  with 
N  =  100  grid-points  and  an  accuracy  rj  =  10-10.  The  first  five  eigenenergies  sn  are 
shown  in  Figs.  10.3, 10.4,  and  10.5,  respectively,  as  horizontal  lines  (left  hand  scale). 
The  numerically  determined  normalized  eigenfunctions  (pn{s)  vs  a  (solid  lines)  are 
presented  on  the  right  hand  side  of  these  figures  and  are  aligned  with  their  respective 
eigenvalues.  They  are  also  compared  with  the  eigenfunctions  (dotted  lines)  of  the 
particle  in  a  box,  i.e.  v(s)  =  0.  In  all  cases  the  eigenfunctions  reflect  the  symmetry 
of  the  various  potentials  v  (a)  which  becomes  particularly  transparent  in  Fig.  10.3  for 
the  potential  V\  (a).  The  eigenfunctions  develop  an  additional  node  in  comparison  to 
the  eigenfunctions  calculated  for  v{s)  —  0.  In  the  other  two  cases  only  the  very 
first  eigenfunctions  n  >  3  appear  to  be  affected  by  the  potential.  Moreover,  in  all 
three  cases,  the  respective  eigenvalues  are  shifted  towards  higher  values  which  is 
consistent  with  a  general  result  of  quantum  mechanical  perturbation  theory. 


Fig.  10.2  The  three  different 
potentials  v  (a)  defined  in 
Eq.  (10.58) 
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lly  determined  eigenvalues  sn  ( left  hand  scale )  and  eigenfunctions  <pn(s)  vs 
)  for  the  potential  v\(s).  The  hrst  five  eigenvalues  are  presented  as  straight 
^ned  with  these  lines  the  eigenfunctions  are  shown  on  the  right  hand  side  of 
? d  lines  indicate  the  eigenfunctions  of  the  particle  in  the  box  with  v  (s)  =  0 
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Fig.  10.4  Numerically  determined  eigenvalues  sn  ( left  hand  scale )  and  eigenfunctions  cpn(s)  vs 
s  ( right  hand  scales )  for  the  potential  u2  (s).  The  hrst  hve  eigenvalues  are  presented  as  straight 
horizontal  lines.  Aligned  with  these  lines  the  eigenfunctions  are  shown  on  the  right  hand  side  of 
this  figure.  The  dotted  lines  indicate  the  eigenfunctions  of  the  particle  in  the  box  with  v  (s)  =  0 
(see  Fig.  10.1) 
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Fig.  10.5  Numerically  determined  eigenvalues  sn  ( left  hand  scale )  and  eigenfunctions  cpn(s)  vs 
s  ( right  hand  scales )  for  the  potential  u3  (y).  The  first  five  eigenvalues  are  presented  as  straight 
horizontal  lines.  Aligned  with  these  lines  the  eigenfunctions  are  shown  on  the  right  hand  side  of 
this  figure.  The  dotted  lines  indicate  the  eigenfunctions  of  the  particle  in  the  box  with  v  (V)  =  0 
(see  Fig.  10.1) 
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Summary 

The  quantum-mechanical  problem  of  a  particle  in  a  box  was  described  by  a 
homogeneous  boundary  value  problem  which  could  be  solved  analytically  if  the 
box’  potential  v(s)  =  0.  On  the  other  hand,  Numerov’s  shooting  algorithm  was 
particularly  designed  to  treat  effectively  homogeneous  boundary  value  problems. 
Consequently,  the  problem  of  the  particle  in  the  box  was  used  to  design  a  Numerov 
shooting  algorithm  which  was  then  tested  against  the  analytic  results.  The  agreement 
between  numerics  and  analytical  results  turned  out  to  be  excellent  and  proved  the 
quality  of  the  method.  For  illustrative  purposes  the  problem  of  the  particle  in  the 
box  was  then  solved  numerically  for  three  different,  more  complex,  box-potentials 
v(s). 


Problems 

1.  Solve  the  one-dimensional  stationary  SCHRODINGER  equation  in  an  infinitely 
deep  potential  well  by  employing  the  shooting  method  according  to  Numerov 
of  Sect.  8.3.  The  total  potential  v(s)  is  assumed  to  be  of  the  form  (10.51).  Choose 
different  potentials  v(s)  within  the  well. 

You  can  check  your  code  by  reproducing  the  results  presented  in  Sects.  10.3 
and  10.4.  In  addition,  determine  numerically  the  expectation  value  (x)  and  the 
variance  var  (v)  of  the  position  operator  v  for  the  first  five  eigenfunctions.  This 
can  be  achieved  by  employing  the  rectangular  rule  of  Chap.  3,  as  illustrated  in 
Eq.  (10.57). 

2.  Solve  the  SCHRODINGER  equation  for  some  potential  v(s)  of  your  choice  and 
plot  the  first  five  eigenfunctions.  This  potential  should  not  be  equal  to  one  of  the 
potentials  discussed  in  this  chapter.  Again,  calculate  (x)  and  var  (v)  for  the  first 
five  eigenfunctions. 

3.  Solve  the  stationary  SCHRODINGER  equation  (10.4)  for  the  harmonic  potential 
V(x)  =  mco1  x1  / 2.  The  algorithm  discussed  in  this  chapter  can  be  applied  by 
choosing  the  box  length  L  sufficiently  large,  so  that  the  harmonic  oscillator 
potential  is  well  within  the  box  for  all  energies  of  interest. 

4.  Solve  the  SCHRODINGER  equation  for  a  double  well  potential  which  can  be 
obtained  by  adding  two  mutually  displaced  harmonic  potentials. 
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Chapter  11 

Partial  Differential  Equations 


11.1  Introduction 

This  section  discusses  some  fundamental  aspects  of  the  numerics  of  partial  differ¬ 
ential  equations  and  it  will  be  restricted  to  methods  already  encountered  in  previous 
chapters,  i.e.  on  finite  difference  methods.  These  are  particularly  useful  to  find 
solutions  of  linear  partial  differential  equations  (PDEs).  Nonlinear  PDEs,  such  as 
the  Navier-Stokes  equations,  require  more  advanced  techniques  as  there  are 
finite  element  methods  or  finite  volume  methods  for  conservation  laws.  A  detailed 
discussion  of  a  wide  spectrum  of  methods  can  be  found  in  many  textbooks  on  the 
numerics  of  PDEs  the  interested  reader  is  referred  to  [1-7]. 

Since  we  already  introduced  the  concepts  of  finite  difference  derivatives  in 
Chap.  2  and  their  application  to  boundary  value  problems  of  ordinary  differential 
equations  in  Sect.  8.2,  we  concentrate  mainly  on  the  application  of  these  methods 
to  specific  types  of  PDEs.  In  detail,  we  investigate  the  Poisson  equation  as  an 
example  for  elliptic  PDEs ,  the  time  dependent  heat  equation  as  an  example  for 
parabolic  PDEs ,  and  the  wave  equation  as  an  example  for  hyperbolic  PDEs.  The 
concepts  presented  here  are,  of  course,  also  applicable  to  other  problems.  However, 
in  contrast  to  the  numerics  of  ordinary  differential  equations,  there  exists  no  general 
recipe  for  the  solution  of  PDEs. 

Another  important  point  to  note  is  that,  as  in  the  theory  of  ordinary  differential 
equations,  the  problem  is  only  fully  determined  when  initial  and/or  boundary 
conditions  have  been  defined.  For  instance,  in  the  case  of  the  POISSON  equation 
only  boundary  conditions  are  required,  while  for  the  time-dependent  heat  equation 
initial  conditions  are  required  as  well.  In  general,  pure  boundary  value  problems 
are  easier  from  a  numerical  point  of  view  because  the  question  whether  or  not  the 
algorithm  is  stable  does  not  play  such  an  important  role.  For  combined  boundary 
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and  initial  value  problems  it  is  essential  to  check  carefully  that  the  discretization  of 
the  time  axis  is  not  in  conflict  with  the  discretization  of  the  space  domain.  This  is 
of  particular  importance  in  the  numerical  treatment  of  hyperbolic  PDEs,  where  the 
so  called  Courant-Friedrichs-Lewy  (CFL)  condition  determines  the  stability 
of  the  algorithm.  We  shall  come  back  to  this  point  in  Sects.  11.3  and  11.4.  Finally, 
we  conclude  this  chapter  with  a  discussion  of  the  numerical  solution  of  the  time- 
dependent  SCHRODINGER  equation. 


11.2  The  Poisson  Equation 

We  consider  the  POISSON  equation  as  a  model  for  an  elliptic  PDE  [8,  9].  Neverthe¬ 
less,  we  review  briefly  some  basics  of  electrodynamics  [10,  11].  The  force  E(r,  t) 
as  a  function  of  position  r  G  M3  and  time  t  G  M+  acting  on  a  particle  with  charge 
q ,  which  moves  with  velocity  v  within  an  electromagnetic  field  described  by  the 
electric  field  E(r,  t)  and  the  magnetic  field  B(r,  t),  is  determined  from: 

F(r,  t)  —  q  [E(r,  t)  +  v  x  B(r,  t)]  .  (11.1) 

We  consider  here  the  electrostatic  case  which  is  characterized  by  a  zero  magnetic 
field  [ B(r ,  t)  —  0]  and  a  time  independent  electric  field.  The  electric  field  E  itself  is 
described  by  the  equation 


V  •  E(r)  =  — p(r)  , 
G) 


(11.2) 


where  the  charge  density  p(r,  t)  acts  as  the  source  of  the  electric  field  E(r,t). 
Here  €o  is  the  dielectric  permittivity  of  vacuum.  Furthermore,  the  electric  field  E 
is  connected  to  the  electrostatic  potential  (p(r)  via 

E(r)  —  — V<p(r)  .  (11.3) 

Thus,  Eq.  (11.2)  is  reformulated  as: 

A(p(r)  =  -—.  (11.4) 

<?o 


This  equation  is  referred  to  as  the  POISSON  equation  and  in  the  particular  case  of 
p(r)  —  0  it  is  referred  to  as  the  Faplace  equation  [12]. 
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We  focus  now  on  the  numerical  solution  of  the  two  dimensional  POISSON 
equation  ( 1 1 .4)  on  a  rectangular  domain  Q  —  [0,  Lx\  x  [0,  Ly\  together  with  boundary 
conditions  cp(x,  y)  =  g(x,  y)  on  3 £2.  In  detail,  we  want  to  solve  the  two-dimensional 
boundary  value  problem 

32  32 

—<p{x,y)  +  —<p(x,y)  =  - p(x,y )  ,  (x,y)  e  Q  , 

<  dx2  dy2  (11.5) 

<p(x,y)  =  g(x,y)  ,  (x,y)  e  3^2  , 

where  we  absorbed  60  into  the  charge  density  p(x,y).  Note  that  a  treatment  of  the 
three  dimensional  case  can  be  carried  out  in  analogue. 

We  employ  a  finite  difference  approximation  to  the  derivatives  which  appear  in 
Eq.  (1 1.5)  (see  Chap.  2)  and  we  define  grid-points  in  x  and  y  direction  via 


Xi  =  xo  +  ihx,  i  =  0, 1, 2, . . . ,  n  ,  (11.6a) 

yj  =  To  +  jhy,  j  =  0, 1,2, . . .  ,m  ,  (11.6b) 

where  hx  and  hy  denote  the  grid-spacing  in  x-  and  y-direction,  respectively.  As 
discussed  in  Chap.  2  we  consider  only  equally  spaced  grid-points.  An  extension  to 
non-uniform  grids  is  straight  forward. 

We  define  the  function  values  on  the  grid-points  as 


<Pu  =  d  . 


(11.7) 


and  similarly  ptj  =  p(xi,yj).  Consequently,  we  find  the  finite  difference  approxima¬ 
tion  of  Eq.  (11.5): 


tpi—lj  2(pij  +  <pi+ 1  j  (Pi,j-\  2(pij  +  (PiJ+ 1 


hi 


+ 


p- 

ny 


~  PiJ 


The  boundary  conditions  (11.5)  can  be  written  as 


(11.8) 


Vo  J 

8oj  > 

7  =  0,1,. 

. . ,  m  , 

(11.9a) 

Vnj 

=  8nj  > 

7  =  0,1,. 

. . ,  m  , 

(11.9b) 

Vi.O 

§i,  0  5 

i  =  1,2,. 

. .  ,n—  1  , 

(11.9c) 

Vi-in 

—  §i,m  5 

i  =  1,2,. 

. .  ,n  —  1  . 

(11. 9d) 
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Equation  (1 1.8)  is  multiplied  by  —  hzxh~/ 2  and  we  obtain  after  rearranging  terms 


(h2x  +  h))  <pij  -  ]-  [h2y  tj  +  <pt+  u)  +  h2x  (<pij- 1  +  (pu+ 1)]  = 


PiJ  ’ 


(11.10) 

for  i  =  1 , ,n  and  j  —  l, ...  ,m.  There  are  different  strategies  how  this  set  of 
equations  might  be  solved.  The  common  strategy  is  to  employ  the  assignments 


<pi,i  ->  n  , 

^1,2  , 


(Pn,m 


(11.11) 


where  f  =  m.  Equation  (11.10)  is  then  rewritten  as  a  matrix  equation  with  a 
vector  of  unknowns  cp  —  {cp\ ,  (p2, . . . ,  (pi)T  according  to  Eq.  (11.11).  The  boundary 
conditions  are  to  be  included  in  the  matrix.  This  matrix  equation  is  then  solved  either 
by  direct  or  iterative  methods  as  they  are  discussed  in  Appendix  C. 

It  is  our  plan  to  solve  Eq.  (11.10)  iteratively.  This  requires  the  introduction  of 
a  superscript  iteration  index  t  and  <p\j  denotes  the  function  value  cp(xi,yj)  after  t- 
iteration  steps.  There  are  two  different  implementations  of  an  iterative  solution, 
namely  the  Gauss -Seidel  or  the  Jacobi  method  (Appendix  C).  They  differ  only 
in  the  update  procedure  of  the  function  values  ( pE  at  the  grid-points.  The  basic  idea 
is  to  develop  an  update  algorithm  which  expresses  the  function  values  <p\j  with  the 

help  of  function  values  at  already  updated  grid-points  and  of  function  values  cpj~l 
determined  in  the  preceding  iteration  step  [Appendix,  Eq.  (C.27)]. 

We  formulate  this  iteration  rule  as 


(11.12) 


where  we  abstained  from  incorporating  a  relaxation  parameter  (see  Appendix  C). 
Note  that  by  using  the  iteration  rule  (11.12)  the  boundary  conditions  have  to  be 
accounted  for  in  an  additional  step. 
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Let  us  specify  the  boundary  conditions  for  a  concrete  problem:  We  we  want 
to  determine  the  electrostatic  potential  of  an  electric  monopole,  dipole,  and 
quadrupole,  respectively.  They  are  placed  inside  a  grounded  box  of  dimensions 
Lx  and  Ly.  Thus,  we  have  to  impose  Dirichlet  boundary  conditions  (p(0,  y)  — 
(p(Lx,y)  —  0  in  v-direction  and  cp(x,  0)  =  <p(x,Ly)  —  0  in  y-direction.  In 
this  particular  case  the  boundary  conditions  can  be  made  part  of  Eq.  (11.12)  by 
restricting  the  loop  over  the  v-grid  (y-grid)  to  i  —  2, . . .  ,N  —  1  which  leaves  the 
boundary  points  (p(0,y)  [cp(x,  0)]  and  (p(Lx,y )  [cp(x,  Ly)]  unchanged.  Furthermore 
we  set  Lx  —  Ly  —  10,  the  number  of  grid-points  on  both  axes  to  n  —  m  —  100,  and 
define  the  domains: 


f2  \  — 

(xn_  10,  Xn]  x  (y«-10,yf  ]  , 

(11.13a) 

£2  2  — 

(x|,x«+io]  x  (y|-10,yf]  , 

(11.13b) 

£2  3  — 

(xn-io,xn\  x  (y-,y-  +  i0]  , 

(11.13c) 

£2  4  = 

(x|,X|+10]  X  (ym,ym+10]  . 

(11.13d) 

The  charge  density  p(x,y)  is  described  by  three  different  scenarios,  namely  the 
electric  monopole 


Pi  (x,  y) 


50  (v,  y)  e  Q\  U  U  U  £2\  , 

0  elsewhere, 


the  electric  dipole 


50  (x,  y)  G  Q\  U  ^2  , 


P2(x,y) 


<  —50  (x,  y)  G  X2 3  U  X24  , 


0  elsewhere, 


and  the  electric  quadrupole: 


Pi(x,y) 


50  (v,  y)  G  iU^4, 

<  —50  (v,  y)  G  U  , 

0  elsewhere, 
v 


(11.14a) 


(11.14b) 


(11.14c) 


These  charge  densities  are  illustrated  in  Fig.  11.1. 
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Fig.  11.1  The  electric  monopole,  dipole,  and  quadrupole  charge  densities  (a)  p\ ( x ,  y),  (b)  p2{x,  y), 
and  (c)  ps(x,  y),  respectively,  as  dehned  in  Eq.  (1 1.14) 


The  solution  of  Eq.  (11.12)  is  regarded  to  be  converged  if  the  potential  cp(x,y ) 
does  not  change  significantly  between  two  consecutive  iteration  steps,  i.e. 


max  (|^r  —  (plj  *|)  <  r)  , 


(11.15) 


where  r\  —  10'  4  is  the  required  accuracy.  A  criterion  to  check  the  relative 
change  can  be  formulated  in  a  similar  fashion.  The  resulting  potential  profiles 
<i p(x,y )  are  presented  in  Fig.  11.2.  They  reflect  perfectly  the  symmetries  of  the 
charge  densities  p\(x,y ),  P2(x,y ),  and  pi(x,y),  respectively.  Finally,  standard  finite 
difference  methods  can  be  applied  to  calculate,  based  on  Eq.  (1 1.3),  the  electric  field 
E(x,  y)  from  the  potential  profiles  cp(x,  y). 


!We  note  that  the  electrostatic  potentials  that  we  calculated  here  numerically  can  also  be 
determined  analytically  with  the  method  of  mirror  charges  [10]. 
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Fig.  11.2  Potential  profile  cp(x,y)  obtained  for  charge  density  (a)  p\{x,  y),  (b)  p2(x,y),  and 
(c)  p3{x,y) 

11.3  The  Time-Dependent  Heat  Equation 

We  discuss  here  the  numerical  solution  of  the  time-dependent  heat  equation  [13, 
14]  which  is  a  representative  of  parabolic  PDEs.  This  equation  has  already  been 
introduced  in  Sect.  9.1,  Eq.  (9.1),  and  is,  reduced  to  the  one-dimensional  case,  of  the 
form 


(11.16) 


with  the  thermal  diffusivity  k.  It  is  augmented  by  appropriate  boundary  and  initial 


conditions.  Again,  we  will  not  discuss  the  extension  to  higher  dimensions  since  it 


is  straight  forward,  however,  maybe  tedious  in  the  general  case.  We  approximate 
the  right  hand  side  of  Eq.  (11.16)  with  the  help  of  the  central  finite  difference 
approximation  (Sect.  2.2)  and  obtain 


Tk-\  (t)  —  2Tk(t)  +  7Vi-i(f) 
h 2 


(11.17) 
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with  the  usual  discretization  xk  —  xo  kh,  k  —  0, . . . ,  N,  in  combination  with  the 
notation  Tk(t)  =  T(xk,  t). 

The  time  derivative  in  Eq.  (11.17)  can  be  approximated  with  the  help  of  methods 
already  discussed  in  Chap.  5.  In  particular,  one  has  to  decide  whether  the  solution 
of  Eq.  (11.17)  should  be  approximated  by  an  explicit  or  an  implicit  integrator.  In 
order  to  emphasize  the  differences  between  the  two  methods,  the  application  of  the 
explicit  Euler  and  of  the  implicit  Euler  method  will  be  studied.  However,  more 
complex  integrators  may  be  applied  as  well.  In  particular,  the  Crank-Nicolson 
method  [15]  proved  to  be  very  useful  for  the  solution  of  parabolic  differential 
equations. 

We  define  tn  =  to  +  nAt  and  7|7  =  Tk(tn)  and  employ  the  explicit  Euler 
scheme  (5.9)  in  Eq.  (11.17)  to  get 


rrn-\- 1  rjifi 

L_k  L_k 

At 


—  K 


rrn 
1k- 1 


2Tk 


+  T 


k~\~ 1 


h2 


with  the  solution: 


rpn-\- 1 

1k 


=  n  +  KAt 


rrn 

1k-l 


-2  Tnk 


+  7? 


k~^r  1 


h 2 


(11.18) 


(11.19) 


The  right  hand  side  of  this  equation  depends  only  on  temperatures  of  the  previous 
time  step,  since  we  used  an  explicit  method.  Although  this  might  seem  advantageous 
on  a  first  glance,  it  turns  out  that  the  above  scheme  is  not  stable  for  arbitrary  choices 
of  At  and  h.  In  particular,  it  is  possible  to  prove  that  the  above  discretization  is  stable 
only  for 


KAt  1 
~  2  ’ 


(11.20) 


A  detailed  discussion  and  proof  of  this  property  can  be  found  in  any  advanced 
textbook  on  numerics  of  PDEs  [1-5]. 

On  the  other  hand,  if  we  apply  the  implicit  Euler  method  (5.10)  to  solve 
Eq.  (11.17)  we  obtain 


7^w+l  'jin 

1k  Lk 


At 


—  K 


’-rn-\- 1 
1k- 1 


2Tn+\ 


+  T, 


<n-\- 1 
k~\~  1 


(11.21) 


which  is  unconditionally  stable.  However,  Eq.  (11.21)  is  an  implicit  equation,  i.e. 
the  function  values  T^\  and  T^,1  are  required  in  order  to  evaluate  7]'+ 1 .  Hence, 
Eq.  (11.21)  has  to  be  solved  as  a  system  of  linear  equations.  This  system  may  be 
written  as 


ATn+\  =  jn  +  p  ( 


(11.22) 
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with  the  vector  Tn  —  (7^,  Y[ , . . . ,  V^)T  and  the  tridiagonal  matrix  A: 


\ 


icAt  i  _i_  2k At  KAt 

h 2  1  h 2  /i2 


(11.23) 


/ 


The  boundary  conditions  are  incorporated  in  the  matrix  A  and  in  the  vector  F.  (See 
Sects.  9.2  and  9.3.)  The  linear  system  of  equations  (1 1 .22)  can  be  solved  numerically 
using  a  direct  or  an  iterative  method.  Employing  an  iterative  method,  imposes  a  third 
index  t  on  the  function  values  of  the  temperature  T  which  accounts  for  the  iteration 
step. 

Let  us  give  a  brief  numerical  example.  We  consider  the  time-dependent  homo¬ 
geneous  heat  equation  (11.16)  on  a  finite  interval  [0,  L\  together  with  the  boundary 
conditions  of  Sect.  9.1: 


T(P)  =  To,  T{L)  —  Tn  .  (11.24) 

In  addition  we  introduce  the  initial  condition 

T(x,  0)  =  0,  xe[0,L].  (11.25) 

Figure  11.3  presents  the  time  evolution  of  T(x,  t)  at  six  different  time-steps  as 
it  was  obtained  with  the  explicit  Euler  method  (11.19).  Here  we  chose  To  =  0, 
Tn  =  2,  N  =  20,  L  =  10,  a:  =  1  as  well  as  At  &  0.5.  Note  that  for  this  choice  of 
parameters,  the  condition  (11.20)  is  fulfilled  since  h  &  1.05  and  therefore 

KAt  1 

—  ^0.45<-.  (11.26) 

h 1  2 

Figure  11.4  corresponds  to  Fig.  11.3  but  now  At  was  chosen  to  be  approximately 
0.7  and: 


KAt  1 

—  ^0.63>-.  (11.27) 

h 1  2 

Thus,  the  stability  criterion  was  violated  and  the  solutions  became  unstable.  Finally, 
Fig.  11.5  presents  results  obtained  with  the  same  parameters  as  for  Fig.  1 1 .4  but  with 
the  help  of  the  implicit  Euler  method  (11.21).  Obviously,  this  procedure  provides 
a  stable  solution  of  the  problem. 
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Fig.  11.3  Solutions  of  the  time-dependent  heat  equation  T(x)  vs  x  generated  by  the  explicit  Euler 
method.  The  stability  criterion  (11.20)  is  fulfilled.  Results  after  n  =  25,  50,  100,  150,  and  300  time 
steps  are  presented,  n  =  0  represents  the  initial  conditions 


xIO7 


Fig.  11.4  Solutions  of  the  time-dependent  heat  equation  T(x)  vs  x  generated  by  the  explicit  Euler 
method.  The  stability  criterion  (11.20)  is  not  fulfilled  and,  therefore,  the  solution  is  apparently 
unstable.  Results  after  n  =  25,  50,  100,  150,  and  200  time  steps  are  presented,  n  =  0  represents 
the  initial  conditions 
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Fig.  11.5  Solutions  of  the  time-dependent  heat  equation  T(x)  vs  x  generated  by  the  implicit 
Euler  method.  Results  after  n  =  25,  50,  100,  150,  and  300  time  steps  are  presented,  n  =  0 
represents  the  initial  conditions 


11.4  The  Wave  Equation 


As  a  model  hyperbolic  PDE  we  consider  briefly  the  wave  equation  [16].  Again,  we 
regard  only  the  one-dimensional  case: 


d2 
dt 2 


d2 

^ jUfat)  • 


(11.28) 


Here,  c  is  the  speed  at  which  the  wave  u(x,  t)  propagates.  Equation  (11.28)  is  to 
be  augmented  by  appropriate  boundary  and  initial  conditions.  A  finite  difference 
approach  similar  to  the  one  discussed  in  Sect.  1E3  will  be  employed  and  the 
discussion  will  be  restricted  to  the  explicit  Euler  approximation.  Consequently, 
Eq.  (11.28)  is  replaced  by 

«ri~2^  +  <+i  =  2Ht-i  -2Ht  +  »;+1 

At2  C  h 2 

We  define  the  parameter  A  =  ^  and  solve  Eq.  (11.29)  for  unk+l 

«z+1  =  2(1  -  a2k  -  «r> + a  2«_! + <+1) . 


(11.30) 
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We  note  two  important  points:  (i)  The  solution  for  time  step  n  +  1  can  only  be 
determined  if  the  solutions  for  the  time  steps  n  and  n  —  1  are  known.  In  particular, 
the  solutions  for  n  =  0  and  n  —  1  are  required  to  obtain  the  solution  for  n  =  2.  The 
function  values  for  n  —  1  can  be  obtained  from  the  initial  conditions  which  must 
include  a  first  order  time  derivative  of  u(x,  t)  since  Eq.  (11.28)  is  a  second  order 
differential  equation  with  respect  to  time  t.  (ii)  As  in  the  case  of  parabolic  problems, 
the  explicit  Euler  approximation  (11.30)  will  not  be  stable  for  arbitrary  values  of 
A.  It  is  only  stable  for 


cAt 

-  <  1  . 

h  ~ 


(11.31) 


This  condition  is  referred  to  as  the  Courant-Friedrichs-Lewy  or  CFL  condition 
[17,  18].  Its  importance  stems  from  the  fact,  that  this  condition  is  not  limited  to  the 
wave  equation  but  holds  for  hyperbolic  problems  in  general.  In  particular,  since  the 
wave  equation  can  always  be  viewed  as  a  combination  of  a  right-  and  a  left-going 
advection  equation,  i.e. 


Y u(x,t)  =  ±c—u(x,t )  , 


(11.32) 


we  gain  the  very  important  property  that  explicit  time  integrators  applied  to  solve 
equations  of  the  type  (11.32)  are  only  stable  if  relation  (11.31)  is  obeyed. 

Let  us  return  to  the  discretization  (11.30).  Suppose  we  have  initial  conditions  of 
the  form 

3 

u(x,  0)  —f{x),  —  u(x,  0)  =  g(v)  .  (11.33) 

ot 

They  can  be  approximated  by 

n  u\  — 

4=  A,  jLjf=8k'  (1L34) 


and  the  solution  of  the  second  relation  in  (11.34)  yields  the  desired  function  values 
for  n  =  1: 


u\  =  ul  +  Atgk  . 


(11.35) 


However,  in  many  cases  it  is  beneficial  to  take  higher  order  terms  into  account.  This 
can  be  achieved  by  employing  a  Taylor  expansion  of  the  form  (Chap.  2): 


3 

—u(x,  0)  + 
ot 


At  d2 
~2  dt2 


u(x,  0)  +  @i At 2)  . 


At 


(11.36) 
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We  make  now  use  of  the  initial  conditions  (11.33),  employ  the  wave  equa¬ 
tion  (11.28),  and  solve  for  u\.  This  gives 

M2c2 

u\  ~  uk  +  Atgk  H - - — fk  +  &(At3)  .  (11.37) 

Here  we  assumed  that  the  second  spatial  derivative  fj!  —  ^f(xk)  of  the  initial 
condition /(v)  exists.  It  may  then  be  approximated  by  a  finite  difference  approach. 

To  be  specific  we  consider  a  vibrating  string  of  length  L,  which  is  fixed  at  its 
ends,  i.e.  u( 0,  t)  —  u(L,  t)  —  0.  Furthermore,  we  assume  that  the  string  was  initially 
at  rest,  i.e. 


3 

T-u(x,  0)  =  0  , 
ot 


and  impose  initial  conditions 


u(x,  0)  = 


0 


elsewhere. 


(11.38) 


(11.39) 


Figure  11.6  presents  results  obtained  with  L  =  l,c  =  2,  N  =  100.  At  was 
chosen  in  such  a  way  that  A  =  0.5.  On  the  other  hand,  Fig.  11.7  presents  calculations 


Fig.  11.6  Solutions  of  the  wave  equation  u(x)  vs  v  generated  by  the  explicit  Euler  method  with 
A  =  0.5.  Results  after  n  =  25,  50,  100,  150,  and  200  time  steps  are  presented,  n  =  0  represents 
the  initial  conditions 
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xIO 


X 


Fig.  11.7  Solutions  of  the  wave  equation  u{x )  vs  x  generated  by  the  explicit  Euler  method  with 
A  =  1.01.  Results  after  n  =  25,  50,  100,  150,  and  200  time  steps  are  presented,  n  =  0  represents 
the  initial  conditions 


performed  with  the  same  parameters  but  now  A  was  set  to  1.01.  Thus,  the  CFL 
condition  (11.31)  was  violated  and  the  solutions  become  unstable. 

In  general,  the  numerical  solution  of  hyperbolic  PDEs  can  be  very  difficult 
to  obtain  since  in  many  cases  these  equations  represent  conservation  laws.  A 
very  popular  class  of  methods  in  this  context  is  referred  to  as  finite  volume 
methods.  A  detailed  discussion  of  these  methods  can  be  found  in  the  book  by 
R.  J.  LeVeque  [6]. 


11.5  The  Time-Dependent  SCHRODINGER  Equation 

We  already  came  across  the  time-dependent  Schrodinger  equation  in  Chap.  10. 
It  reads 


a 

ifi  —  &  (x,  t)  —  H (x,  t)  ,  (11 .40) 

ot 

where  fi  is  the  reduced  Planck  constant,  &(. x ,  t)  is  the  wave  function,  and  H  is  the 
Hamilton  operator.  Since  the  Schrodinger  equation  contains  a  complex  coef¬ 
ficient,  it  cannot  be  categorized  as  a  PDE  of  one  of  the  familiar  types,  i.e.  elliptic, 
parabolic  or  hyperbolic.  In  fact,  the  SCHRODINGER  equation  shows  parabolic  as 
well  as  hyperbolic  behavior  (it  is  of  the  form  of  the  diffusion  equation  but  allows 
for  wave  solutions).  We  discuss  here  briefly  a  very  elegant  method  developed  to 
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numerically  approximate  solutions  of  the  time-dependent  SCHRODINGER  equation, 
A  prominent  alternative  method,  the  split  operator  technique,  is  briefly  explained  in 
Appendix  D. 

We  note  that  Eq.  (1 1.40)  has  the  formal  solution 

^{xj)  —  exp  — ffj  ^(x,0)  —  0)  ,  (11.41) 


where  we  assumed  that  H  is  independent  of  time  t.  We  note  that  the  operator  U ( t ) 
on  the  right  hand  side  of  Eq.  (1 1.41)  propagates  the  solution  in  time.  Furthermore,  it 
is  a  unitary  operator  and  therefore  preserves  the  norm  of  the  wave-function  ^(x,  t). 
U(t )  is  usually  referred  to  as  the  unitary  time -evolution  operator  [19]. 

We  employ  relation  (1 1.41)  and  obtain 


^(x,  t+  At)  —  exp 


i(t  H-  At) 
ti 


&  (x,  0)  =  exp 


^  (v,  t)  . 


(11.42) 


Expanding  the  exponential  in  this  equation  in  its  series  representation  and  truncating 
the  series  after  the  second  term  results  in  the  approximation 

iAt  \ 

1  -  —  H\  V(x,t)  .  (11.43) 


\P(x,  t  +  At) 


Again,  we  introduce  grid-spacing  v \  —  kAx ,  k  e  N  and  the  correspondingly  indexed 
functions  &£  =  ^( Xk ,  nAt)  which  results  in 

u+l  =  (i  - 


iAt  \ 

-j-Hj  ^  .  (11.44) 


Using  Eq.  (10.23)  for  the  Hamiltonian  in  its  position  space  representation  in  the  one¬ 
dimensional  case  and  by  approximating  the  second  derivative  with  the  help  of  the 
central  difference  approximation  we  arrive  at 


yn+l  =  yn  _ 


iAt  (  fi 2 


\T/n 
*k- 1 


-  +  n+i 


h 


2m 


Ax2 


+  w 


(11.45) 


where  we  defined  14  =  V(xk). 

The  iteration  scheme  (11.45)  resembles  the  explicit  Euler  approxima¬ 
tion  (11.18)  of  the  heat  equation  with  the  difference  that  we  have  here  an 
imaginary  coefficient.  An  implicit  procedure  for  the  time-dependent  SCHRODINGER 


2  We  remember  that  unitary  means  that  UU^  =  UTU  =  1. 
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equation  (1 1.40)  can  be  obtained  by  inversion  of  Eq.  (1 1.42): 

t)  —  t  +  At)  —  exp  //J  ^(x,  t  +  At)  . 

A  series  expansion  of  the  exponential  results  in  the  desired  relation: 


(11.46) 


iAt 


m  =  1 1  +  -j- h  i  ^ 


n-\- 1 


(11.47) 


We  emphasize  that  the  unitarity  of  the  time-evolution  operator  is  of  fundamental 
importance  since  it  preserves  the  norm  of  the  wave-function.  However,  in  truncating 
the  series  representation  of  the  unitary  time  evolution  operator  U(At)  we  certainly 
violate  the  unitarity  of  U (At).  This  problem  can  be  remedied  by  imposing  unitarity 
of  the  time  evolution  as  an  additional  requirement.  This  requirement  can  be 
incorporated  by  normalizing  the  wave-function  after  each  time  step. 

We  demonstrate  now  that  the  Crank-Nicolson  scheme  [15]  can  be  applied 
successfully  to  solve  Eq.  (11 .40)  numerically  for  a  particular  potential.  The  Crank- 
Nicolson  scheme  can  be  obtained  by  realizing  that 


U(At)  —  exp 
=  exp 

=  exp 

=  rt 


exp 


exp 


Hence,  we  obtain  from  Eq.  (1 1.45) 


(11.48) 


(11.49) 


or  by  expanding  U  in  a  series  and  truncating  after  the  second  term 
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Inserting  the  finite  difference  approximation  of  the  Hamiltonian  H  and  rearranging 
terms  yields 


1  + 


iAt  (  fi 


2 fi  \  mAx2 


+  Vk 


xpn+\  _  iAth 

Am  Ax2 


{n~i+ntl)  =  4".  di-si) 


where  we  defined  Q?  as 


K  = 


i  - 


iAt  (  fi 2 


2fi  \  mAx 2 


+  V* 


iAtfi 
4  mAx2 


K-i  +  *Z+ 1) 


(11.52) 


Both  sides  of  Eq.  (11.51)  are  now  multiplied  by  iAmAx2 / ( fiAt )  and  this  gives 


WV  +  2  (l2mAx  _  1  _  'UAL  Vk  ]  m+\  +  ^«+!  =  m  > 


"-11  +21- 


Atfi 


h2 


(11.53) 


where 


K 


=  -9, 


"-1  +  2  ( 


( i2mAx 2  mAx2 

+  1  +  -  ^+1 


At  ft 


fi 2 


(11.54) 


We  recognize  that  Eq.  (11.53)  establishes  a  system  of  linear  equations  and  rewrite 
it  in  matrix  form: 


A*K+1  =  Qn  .  (11.55) 

Here,  we  defined  the  vectors  —  (^q  ,  . . . ,  X277  =  (£2q,  , . . . , 

and  the  tridiagonal  matrix 


\ 


l  A  1 


9 


with  /*;  for  k  =  1,2, ...  ,N  given  by 


( i2mAx2  mAx2 

2 


(11.56) 


(11.57) 


according  to  Eq.  (11.53). 
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The  system  (1 1.56)  is  solved  iteratively.  However,  in  this  case  we  employ  a  more 
elegant  ansatz  which  is  allowed  for  tridiagonal  matrices.  We  set 

^+1  =  ak^+l  +  bnk  ,  (11.58) 


and  apply  it  to  Eq.  (1 1.53).  After  rearranging  terms  we  arrive  at: 
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mAx2 


UmAx2 

Atfi 


xl/'l+ 1  — 

k 


=  0Z+1  +  bnk  -  £2£ 


We  define 


Oik  —  2(1  + 


mAx 2  UmAx2 

Vk- 


fi 2 


Atfi 


Ok 

2 


and  obtain  from  Eq.  (11.59) 


K 


Oik 


Un  —  Qn 

lTfn+ 1  i  °k  **k 

^  L  —  1  \ 

Oik 


However,  due  to  the  ansatz  (11.58)  we  also  have 


V^=ak-^+bl_ ,, 


n-\- 1 


which  results  in  the  relations 


1 

^/v- 1  =  -  , 

Oik 


and 


b 


n 

k- 1 


gg  -  gg 

ak 


{b'l  -  Ql)  «*-i  ■ 


Equation  (11.63)  leads  to  the  recursion  relation 


ak 
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UmAx2 

Atfi 


1 

Uk-l 


and  we  derive  from  Eq.  (1 1.64): 


hn 

°k- 1 


(11.59) 


(11.60) 


(11.61) 


(11.62) 


(11.63) 


(11.64) 


(11.65) 


Uk- 1 


(11.66) 


1 1.5  The  Time-Dependent  SCHRODINGER  Equation 


175 


The  remaining  question  is  how  to  choose  ao  and  b1^.  We  impose  the  boundary 
conditions  =  0  and  ^  =  0  and  derive  from  Eq.  (11.53): 


UmAx2 

Atfi 


mAx2  \ 
-1-— Vl) 


(11.67) 


A  comparison  of  this  equation  with  the  ansatz  (11.58),  i.e.  = 

reveals  that 


a\ 


mAx 2 


UmAx 2  \ 
Atfi  )  ’ 


(11.68) 


and 


b\  =  Q\  . 


(11.69) 


These  expressions  are  equivalent  to  =  oo  and  it  is,  thus,  impossible  to 
calculate  ^j?+1  from  1 .  However,  we  can  determine  the  function  values  1 
via  a  backward  recursion 


^"+1  =  -  W/ -  **)  .  (11.70) 

which  is  initialized  with  the  boundary  condition  =  0.  We  can  now  summarize 
the  algorithm: 

1.  Choose  the  initial  conditions  k  —  0, 1, . . . ,  N  which  satisfy  the  boundary 
conditions  =  0  and  ^  =  0. 

2.  Set 


a\ 


mAx2 


UmAx2  ^ 
Ath  )  ’ 


and  calculate  for  k  =  2, . . . ,  N  —  1 


ak 
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mAx2 


UmAx2 

Atfi 
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tfjfc-i 


(11.71) 


(11.72) 


3.  Start  the  time  loop:  n  =  0, 1 , . . . ,  M,  with  M  the  maximum  number  of  time  steps. 

4.  Calculate  for  k  —  1 , 2, . . . ,  N  —  1 


+  2  I 


/  UmAx - 


+  1  + 


mAx" 

ft2 


v*  I  K  ~  n+i 


(11.73) 


176 


1 1  Partial  Differential  Equations 


5.  Set 


b\  =  Qnx,  (11.74) 

and  calculate  for  k  =  2, . . . ,  N  —  1 

bn 

bnk  =  — -  +  nnk.  (n.75) 

Uk- 1 

6.  Calculate  fovk  =  N—  1 ,  N  —  2, . . . ,  1 

*£+1  =  -(*£}-%)  >  (n-76) 

ak 

where  the  boundary  conditions  ^  =  0  are  to  be  considered. 

7.  Set  ft  =  ft  +  1  and  go  to  step  4. 

The  application  of  this  algorithm  is  now  elucidated  with  the  help  of  a  specific 
example,  the  quantum  mechanical  tunneling  effect.  The  initial  condition  is  described 
by  a  Gauss  wave  packet 


(11.77) 

centered  at  v  =  Vo  which  propagates  in  positive  v-direction  with  momentum  q.  This 
wave-function  is  not  yet  normalized.  Furthermore,  we  regard  the  single  potential 
barrier 


&  (x,  0)  =  exp  ( iqx )  exp 


(x  -  x0y 
2a2 


Vi(x)  =  j 

V0  x  €  [a,  b ]  , 

0  elsewhere, 

(11.78) 

or  the  double  potential  barrier 

(  V0 

V2(x)  - 

o 

v  G  [a,  b]  U  [c,  d]  , 

elsewhere. 

(11.79) 

The  scales  and  parameters  are  chosen  in  the  following  way: 
At  —  0.1,  m  —  fi  —  1,  Vo  =  200,  q  —  2,  a  =  20,  Vo  =  0.7,  a  — 

L  =  500,  Ax  =  1, 

:  250,  b  =  260,  c  = 

300,  and  d  —  310.  Figure  11.8  presents  the  time  evolution  of  the  square  modulus  of 
the  resulting  wave-function  |  ^ (v,  nAt)  |2  vs  v  (solid  line,  left  hand  scale)  at  different 
time  steps  n  —  500,  1000,  and  1500.  The  time  step  n  —  0  corresponds  to  the  initial 
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Fig.  11.8  Time  evolution  of  the  square  modulus  of  the  wave-function  |  \j/ (x)\2  vs  x  (solid  line,  left 
hand  scale).  The  potential  V(x)  =  Vi(x)  is  also  plotted  vs  x  (dashed  line,  right  hand  scale).  We 
present  the  results  for  n  =  500,  1000,  and  1500  time  steps.  The  graph  labeled  by  n  =  0  represents 
the  initial  configuration 


condition.  The  potential  V\(x)  vs  x  is  also  plotted  (dashed  line,  right  hand  scale). 
Figure  1 1.9  corresponds  to  Fig.  11.8  but  now  the  potential  is  described  by  V2W  and 
additional  time  steps  for  n  —  2000  and  2500  have  been  added. 

In  both  figures  a  typical  quantum  mechanical  effect  which  is  referred  to  as 
tunneling  can  be  observed.  In  particular,  there  exists  a  finite  probability  that  the 
potential  barrier  can  be  crossed,  although,  from  a  classical  point  of  view,  the 
particle’s  energy  is  not  sufficient  to  overcome  the  barrier.  A  detailed  discussion  of 
this  effect  and  its  technological  importance  can  be  found  in  any  standard  textbook 
on  quantum  mechanics  [  1 9-2 1  ] . 

In  conclusion  we  remark  that  a  very  prominent  method  to  solve  numerically  the 
time-dependent  SCHRODINGER  equation  is  based  on  the  FOURIER  transformation. 
The  numerical  implementation  of  the  FOURIER  transformation  as  well  as  its 
application  to  the  SCHRODINGER  equation  is  briefly  discussed  in  Appendix  D. 
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X 

> 


X 
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Fig.  11.9  Time  evolution  of  the  square  modulus  of  the  wave-function  |  \j/ (x)\2  vs  v  (solid  line,  left 
hand  scale).  The  potential  V(x)  =  V2  (v)  is  also  plotted  vs  v  (dashed  line,  right  hand  scale).  We 
present  the  results  for  n  =  500,  1000,  1500,  2000,  and  2500  time  steps.  The  graph  labeled  by 
n  =  0  represents  the  initial  configuration 


Summary 

This  chapter  was  about  linear  PDEs  and  how  to  find  solutions  numerically.  The 
dominating  theme  was  the  application  of  the  various  finite  difference  methods. 
The  two-dimensional  POISSON  equation  served  as  an  example  for  an  elliptic  PDE. 
The  algorithm  to  solve  this  equation  developed  here  was  based  on  the  central 
difference  derivative.  Parabolic  PDEs  were  represented  by  the  time-dependent  one¬ 
dimensional  heat  equation.  The  numerical  solution  proved  to  be  possible  by  either 
using  an  explicit  or  an  implicit  Euler  scheme.  For  the  explicit  Euler  scheme  the 
appropriate  choice  of  time  and  space  discretization  proved  to  be  essential  for  the 
stability  of  the  algorithm.  The  one-dimensional  wave  equation  was  introduced  as 
an  example  of  a  hyperbolic  PDE.  The  solution  was  found  by  employing  an  explicit 


References 


179 


Euler  approximation.  Again  time  and  space  discretization  had  to  follow  a  specific 
stability  criterion,  the  Courant-Friedrichs-Lewy  condition.  Finally,  the  one¬ 
dimensional  time-dependent  Schrodinger  equation  was  studied.  It  does  not  fit 
into  any  of  the  above  categories.  The  algorithm  to  find  a  numerical  solution  was 
developed  here  on  the  basis  of  a  Crank-Nicolson  scheme  and  it  was  tested  with 
the  quantum  mechanical  tunneling  effect. 


Problems 


1.  Write  a  program  which  solves  the  two-dimensional  POISSON  equation  for  an 
arbitrary  charge  density  distribution  p(x,  y).  Use  the  numerical  method  discussed 
in  Sect.  11.2. 

a.  Impose  Dirichlet  boundary  conditions  ip (x,  0)  =  cp(x,Ly)  —  cp(0,y)  — 
( p(Lx ,  y)  =  0  as  described  in  Sect.  1 1.2.  Test  the  program  by  first  reproducing 
Fig.  11.2. 

b.  Solve  the  POISSON  equation  for  different  charge  densities  of  your  choice. 

c.  Calculate  the  electric  field  E(x,  y)  with  the  help  of  Eq.  (11.3). 

2.  Calculate  the  time  evolution  of  the  temperature  distribution  T(x,t)  along  a 
cylindrical  rod  described  in  Sect.  9.3.  The  rod  is  kept  at  constant  temperatures 
To  and  TN  at  its  ends.  The  parameters  used  in  Sect.  9.3  stay  unchanged.  Study 
also  the  case  of  a  heat  sink  as  suggested  in  the  Problems  section  of  Chap.  9. 

3.  Calculate  the  time  evolution  of  the  square  modulus  of  the  wave-function  \  \j/{x)\1 2 3 
vs  v  for  a  potential  V\(x)  according  to  Eq.  (11.78)  with  Vo  <  0  (quantum  well). 
In  a  second  step,  modify  the  potential  according  to 


V(x)  = 

with  V\  >  0,  V2  <  0,  and  |  V\  \ 


>1 

v  e  [a,  b]  U  [c,  d] 

<  v2 

x  e  [b,  c] 

.  0 

elsewhere, 

<  |V2|. 
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Part  II 

Stochastic  Methods 


Chapter  12 

Pseudo-random  Number  Generators 


12.1  Introduction 

Stochastic  methods  in  Computational  Physics  are  based  on  the  availability  of 
random  numbers  and  on  the  concepts  of  probability  theory.  (Readers  not  familiar 
with  the  basic  concepts  of  probability  theory  are  highly  encouraged  to  study 
Appendix  E  before  proceeding.)  The  required  random  numbers  are  provided  by 
numerical  random  number  generators  and,  thus,  we  have  to  speak,  more  precisely, 
of  pseudo-random  numbers.  Let  us  now  motivate  the  problem  and  discuss  some 
preliminary  items. 

A  popular  example  of  randomness  in  physical  systems  is  certainly  the  outcome 
of  a  dice-throw  or  the  drawing  of  lotto  numbers.  Even  though  the  outcome  of  a 
dice-throw  is  completely  determined  by  the  initial  conditions,  it  is  effectively  unpre¬ 
dictable  because  the  initial  conditions  cannot  be  determined  accurately  enough.  A 
probabilistic  description,  which  assigns  the  random  variables  1, 2,  3,  4,  5,  and  6  a 
probability  of  1  / 6,  respectively,  is  much  more  convenient  and  promising.  It  has  to 
be  kept  in  mind,  of  course,  that  all  predictions  obtained  on  the  basis  of  such  an 
approach  are  also  clearly  probabilistic  in  their  nature. 

Another  example  is  Brownian  motion  or  diffusion  which  describes  the  random 
motion  of  dust  particles  on  a  fluid  surface.  It  is  in  this  case  particularly  obvious 
that  a  description  with  the  help  of  a  stochastic  differential  equation ,  such  as  the 
L ANGEVIN  equation  [1],  is  completely  sufficient  and  more  to  the  point  than  a 
description  based  on  the  dynamics  of  a  large  number  of  interacting  particles. 

Stochastic  methods  are  not  confined  to  physics:  They  are  applied  very  success¬ 
fully  in  many  other  fields  of  expertise,  like  biology  [2],  economics  [3,  4],  medicine 
[5],  etc.  Finally,  an  interesting  and  purely  mathematical  application  can  be  found  in 
the  evaluation  of  integrals  as  an  alternative  to  the  methods  discussed  in  Chap.  3.  This 
method  is  referred  to  as  Monte-Carlo  integration  and  will  be  addressed  in  Chap.  14 
together  with  a  basic  introduction  to  stochastics  and  its  applications  in  physics. 
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12  Pseudo-random  Number  Generators 


From  the  numerical  point  of  view,  there  is  one  common  denominator  to  all 
these  applications:  Random  numbers  are  an  essential  tool  and,  consequently,  so  are 
random  number  generators.  Thus,  a  closer  inspection  of  randomness  in  general  and 
the  generation  of  random  numbers  or  sequences  in  particular  is  required.  We  have  to 
explain  what  we  understand  by  randomness  and  how  it  can  be  measured.  Moreover, 
based  on  this  discussion  we  have  to  formulate  requirements  to  be  imposed  on  the 
random  number  generators  to  deliver  useful  random  numbers. 

Although,  we  might  have  an  intuitive  picture  of  randomness  it  is  hard  to 
formulate  without  relying  on  mathematics.  For  instance,  consider  the  sequence  s\ 
which  consists  of  A  elements: 

s\  =  0, 0, 0, 0, 0, . . .  ,  A  elements.  (12.1) 

Is  it  random?  The  question  cannot  be  answered  without  further  information. 
Suppose,  the  numbers  in  sequence  s\  were  drawn  from  some  set  5? .  If  this  set  is 
of  the  form 


=  {0}  ,  (12.2) 

then  the  above  sequence  s\  is  certainly  not  random  since  there  is  only  one  possible 
outcome.  However,  suppose  the  numbers  of  the  sequence  s\  were  drawn  from  the 

set  <5^2 


^2  =  {0,1},  (12.3) 

with  the  events  0  and  1  together  with  assigned  probabilities  P(0)  and  P(l).  These 
probabilities  describe  the  probability  that  the  outcome  of  a  measurement  on  the  set 
5^2  yields  either  the  event  0  or  1 ,  respectively.  For  instance,  in  the  case  of  tossing  a 
coin  the  event  0  may  correspond  to  heads  while  1  stands  for  tails.  (To  register  this 
result  is,  within  this  context,  a  measurement.)  In  this  case  the  probabilities  are  given 
by  P(0)  =  P(l)  =  1/2  under  the  premise  that  the  coin  is  perfectly  ideal  and  has 
not  been  manipulated.  Even,  if  we  know  that  the  coin  has  not  been  manipulated, 
sequence  (12.1)  is  still  a  possible  outcome,  although  it  is  rather  improbable  for  a 
large  number  A  of  measurements  (repeated  tosses  of  the  coin). 

A  literal  definition  of  randomness  within  the  context  of  a  random  sequence  was 
given  by  G.  J.  Chaitin  [6]: 

[. . .  ]  a  series  of  numbers  is  random  if  the  smallest  algorithm  capable  of  specifying  it  to  a 

computer  has  about  the  same  number  of  bits  of  information  as  the  series  itself. 

This  definition  seems  to  include  the  most  important  features  of  randomness 
which  we  are  used  to  from  our  experience.  Since,  no  universal  trend  is  observable, 
reproducing  the  sequence  requires  the  knowledge  of  every  single  constituent.  Hence, 
one  may  employ  the  sloppy  definition:  Randomness  is  the  lack  of  a  universal  trend. 

But  how  can  we  test  whether  or  not  a  given  sequence  really  follows  a  certain  dis¬ 
tribution?  Of  course,  one  can  simply  exploit  the  statistical  definition  of  probability, 
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Appendix,  Eq.  (E.4):  The  probability  of  a  certain  outcome  is  measured  by  counting 
the  particular  results.  This  procedure  seems  to  be  quite  promising,  however,  it  has 
to  be  kept  in  mind  that  the  statistical  definition  of  probability  is  only  valid  in  the 
limit  m  oo,  where  m  is  the  number  of  measurements.  Hence,  it  is  fundamentally 
impossible  to  determine  whether  or  not  a  sequence  is  random  because  an  infinite 
number  of  elements  would  have  to  be  evaluated  and  analyzed.  More  promising 
appears  to  be  the  calculation  of  moments  or  correlations  (see  Appendix,  Sects.  E.2 
and  E.10)  from  the  sequence  and  to  compare  such  a  result  with  known  values 
for  real  random  numbers.  These  statistical  tests  will  be  discussed  in  Sect.  12.3. 
If  we  consider  the  sequence  (12.1)  drawn  from  the  set  <5*2,  [Eq.  (12.3),  uniform 
distribution  assumed]  we  can  deduce  that  it  is  a  very  improbable  result  for  large 
A,  although  it  is  certainly  a  possible  outcome.  Methods  based  on  this  train  of 
thoughts  are  known  as  hypothesis  testings  and  we  discuss  the  /2-test  as  a  simple 
representative  of  such  tests  in  Sect.  12.3. 

We  are  now  in  a  position  to  clarify  what  we  understand  by  a  random  number:  We 
regard  a  random  sequence  drawn  from  the  set 

^3  =  {0,1, 2,  3,  4,  5,  6,  7,  8,  9}.  (12.4) 

If  the  random  number  is  to  be  uniformly  distributed,  we  assign  probabilities  P(k)  — 
1/10,  k  =  0,1,. ..,9  and  if  we  would  like  to  obtain  a  random  number  out  of  the 
interval 


=  [0, 1)  ,  (12.5) 

we  may  simply  draw  the  sequence  S2  =  a\,  02, 03, . . .  from  ^3  and  compose  the 
random  number  r  as 


r  —  0.ai^2^3 ....  (12.6) 

Section  12.2  is  dedicated  to  the  discussion  of  different  methods  of  how  to 
generate  so  called  pseudo-random  numbers.  A  pseudo-random  number  is  a  number 
generated  with  the  help  of  a  deterministic  algorithm,  however,  it  shows  a  behavior 
as  if  it  were  random.  This  implicates  that  its  statistical  properties  are  close  to  that  of 
true  random  numbers.  In  contrast  to  pseudo-random  numbers,  real  random  numbers 
are  truly  random.  A  real  random  number  can  be  obtained  from  experiments.  One 
could,  for  instance,  simply  toss  a  coin  and  register  the  resulting  sequence  of  zeros 
and  ones.  A  more  sophisticated  method  is  to  exploit  the  radioactive  decay  of  a 
nucleus,  which  is  believed  to  be  purely  stochastic.  There  are  also  more  exotic 
ideas,  such  as  using  higher  digits  of  n  which  are  assumed  to  behave  as  if  they 
were  random.  However,  all  these  methods  have  in  common  that  they  are  far  too 
slow  for  computational  purposes.  Moreover,  an  experimental  approach  is  obviously 
not  reproducible  in  the  sense,  that  a  random  sequence  cannot  be  reproduced  on 
demand,  but  the  reproducibility  of  a  random  number  sequence  is  essential  for  many 
applications. 
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This  leads  us  to  the  formulation  of  several  criteria  a  random  number  generator 
will  have  to  comply  with.  It  should 

•  produce  pseudo-random  numbers  whose  statistical  properties  are  as  close  as 
possible  to  those  of  real  random  numbers. 

•  have  a  long  period:  It  should  generate  a  non-repeating  sequence  of  random 
numbers  which  is  sufficiently  long  for  computational  purposes. 

•  be  reproducible  in  the  sense  defined  above,  as  well  as  restartable  from  an  arbitrary 
break-point. 

•  be  fast  and  parallelizable:  It  should  not  be  the  limiting  component  in  simulations. 

We  restrict,  within  this  chapter,  our  discussion  to  random  numbers  that  are 
uniformly  distributed  over  a  finite  set.  Thus,  we  assign  to  all  possible  outcomes  of 
a  measurement  the  same  probability.  The  generation  of  non-uniformly  distributed 
random  numbers  from  uniformly  distributed  random  numbers  is  not  a  difficult  task 
and  will  be  discussed  in  more  detail  in  Chap.  13. 


12.2  Different  Methods 

We  discuss  here  different  types  of  pseudo-random  number  generators  [7-9]  which 
generate  a  pseudo-random  number  r  which  is  uniformly  distributed  within  the 
interval  [0, 1).  Hence,  its  probability  density  function  (pdf)  is  given  by 

(1  r  G  [0, 1)  , 

P(r)  =  (12.7) 

(  0  elsewhere, 

and  from  this  follows  the  cumulative  distribution  function  (cdf;  see  Appendix  E): 

r  <  0  , 

0  <  r  <  1  ,  (12.8) 

r  >  1  . 

We  introduce  here  only  some  of  the  most  basic  concepts  for  pseudo-random 
number  generators.  However,  in  huge  simulations  based  on  random  numbers 
standard  pseudo-random  number  generators  provided  by  the  various  compilers  may 
not  be  sufficient  due  to  their  rather  short  period  and  bad  statistical  properties.  In 
this  case  it  is,  therefore,  recommended  to  consult  the  literature  [7,  10]  and  use  more 
advanced  techniques  in  order  to  obtain  reliable  results. 


P(r)  =  f  drp(r)  — 

Jo 


0 

r 

1 
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Linear  Congruential  Generators 

Linear  congruential  generators  are  the  simplest  and  most  prominent  random  number 
generators.  They  produce  a  sequence  of  integers  {xn},n  e  N  following  the  rule 

xn+\  =  (< axn  +  c)  mod  m  ,  (12.9) 

where  a ,  c  and  m  are  positive  integers  which  obey  0  <  a,  c  <  m.  Furthermore, 
the  generator  is  initialized  by  its  seed  xq,  which  is  also  a  positive  (in  most  cases 
odd)  integer  in  the  range  0  <  xq  <  m.  The  seed  is  commonly  taken  from,  for 
instance,  the  system  time  in  order  to  avoid  repetition  at  a  restart  of  the  sequence.  In 
many  environments  it  is,  therefore,  necessary  to  fix  the  seed  artificially  whenever 
one  wants  to  perform  reproducible  tests. 

We  note  that  the  sequence  resulting  from  Eq.  (12.9)  is  bounded  to  the  interval 
xn  —  [0,  m  —  1]  and,  hence,  its  maximum  period  is  m.  However,  the  actual  period 
of  the  sequence  highly  depends  on  the  choices  of  the  parameters  a ,  c  and  m  as 
well  as  on  the  seed  xq.  In  general,  linear  congruential  generators  are  very  fast  and 
simple,  however,  they  have  rather  short  periods.  Moreover,  they  are  very  susceptible 
to  correlations  since  the  value  1  is  calculated  from  xn  only.  (This  is  obviously  a 
property  which  does  not  apply  to  real  independent  random  numbers  and  it  should 
therefore  be  eliminated!)  In  Sect.  12.3  we  will  discuss  some  simple  methods  which 
allow  to  identify  such  correlations. 

One  of  the  most  prominent  choices  for  the  parameters  in  Eq.  (12.9)  are  the  Park- 
Miller  parameters  [10]: 


a  =  75,  c  =  0,  m  =  231  —  1  .  (12.10) 

Note  that  one  has  to  be  particularly  careful  when  choosing  c  —  0.  It  follows  from 
Eq.  (12.9)  that  if  c  —  0  and  if  for  any  n,  xn  —  0  one  obtains  Xk  —  0  for  all  k  >  n. 
The  random  numbers  rn  described  by  the  pdf  (12.7)  are  obtained  via 

rn  =  -€[ 0,1).  (12.11) 

m 

Let  us  briefly  discuss  two  famous  improvements  which  concentrate  on  the 
reduction  of  correlations  and  an  elongation  of  the  period:  The  first  idea  which  is 
referred  to  as  shuffling  [10]  includes  a  second  pseudo-random  step.  One  calculates 
N  numbers  rn  from  Eqs.  (12.9)  and  (12.11)  and  stores  these  numbers  in  an  array. 
If  a  random  number  is  needed  by  the  executing  program,  a  second  random  integer 
k  £  [1 ,  TV]  is  drawn  and  the  k- th  element  is  taken  from  this  array.  In  order  to  avoid 
that  the  same  random  number  is  used  again,  the  k-th  element  of  the  array  is  replaced 
by  a  new  random  number  which,  again,  is  calculated  from  (12.9)  and  (12.1 1). 
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The  second  idea  to  improve  the  linear  congruential  generator  is  simply  to  include 
more  previous  elements  of  the  sequence: 

%n-\- 1  —  E  cikXn-k  |  mod  m  ,  (12.12) 

\k= o  / 

where  f  >  0  and  ai  ^  0.  Again,  the  periodicity  depends  highly  on  the  choice  of 
the  parameters  and  on  the  seed.  A  specific  variation  of  random  number  generators 
using  Eq.  (12.12)  are  the  FIBONACCI  generators. 


Fibonacci  Generators 

The  Fibonacci  sequence  is  given  by 

xn+ 1  =  xn  +xn-\,  Xo  =  0,  x\  —  1  ,  (12.13) 

which  results  for  n  >  1  in 

1,1,2,3,5,8,13,21,34,55,89,...  .  (12.14) 

Choosing  in  Eq.  (12.12)  m  —  10,  l  —  1  and  ao  =  a\  —  1  simply  leaves  the  last 
digits  of  the  sequence  (12.14): 

1, 1,2, 3, 5, 8, 3, 1,4, 5, 9,...  .  (12.15) 

This  suggests  the  definition  of  a  pseudo-random  number  generator  based  on  the 
Fibonacci  sequence  [11].  It  is  of  the  form 


xn+\  —  (xn  +  xn-\ )  mod  m  ,  (12.16) 

which,  according  to  our  previous  discussion,  allows  a  periodicity  exceeding  m.  A 
straightforward  generalization  results  in  the  so  called  lagged  FIBONACCI  genera¬ 
tors : 


xn+ 1  =  ( xn-p  0  xn-q )  mod  m  ,  (12.17) 

where  p,  ^  e  N  and  the  operator  0  stands  for  any  binary  operation,  such  as  addition, 
subtraction,  multiplication  or  some  logical  operation.  Two  of  the  most  popular 
lagged  Fibonacci  generators  are  the  shift  register  generator  and  the  MARS  AGLIA- 
ZAMAN  generator. 

The  shift  register  generator  is  based  on  the  exclusive  or  (XOR;  ®)  operation, 
which  acts  on  each  bit  of  the  numbers  xn-p  and  xn~q.  In  particular,  the  recurrence 
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relation  reads 


%n  —  %n—p  ®  %n—q 


(12.18) 


The  XOR  operation  ®  is  shown  in  the  following  multiplication  table: 


a 

b 

a  ®  b 

0 

0 

0 

1 

0 

1 

0 

1 

1 

1 

1 

0 

Hence,  suppose  that  the  binary  representation  of  xn-p  is  of  the  form  01001110... 
and  for  xn-q  we  have  11110011  ...  Then  we  get  from  Eq.  (12.18): 


Xn  — p 

0 

1 

0 

0 

1 

1 

1 

0  ... 

Xn  — q 

1 

1 

1 

1 

0 

0 

1 

1  ... 

Xn 

1 

0 

1 

1 

1 

1 

0 

1  ... 

A  very  prominent  choice  is  given  by  p  —  250  and  q  —  103  which  yields  a 
superior  periodicity  of  order  <^(1075).  The  algorithm  is  initialized  with  the  help 
of,  for  instance,  a  linear  congruential  generator. 

In  contrast,  the  MARSAGLIA-ZAMAN  generator  ( subtract-with-borrow  scheme) 
[12]  uses  the  subtraction  operation  and  may  be  written  by  introducing  the  so  called 
carry  bit  A  as 


A 


—  %n—p  Xn  — 


n—q 


Cn—  1  j 


where  Xi  G  [0,  m]  for  all  i.  Then, 


x 


n 


A  A  >  0  , 
A  +  m  A  <  0  . 


and  cn  is  obtained  via 


Cn 


0  A  >  0  , 

1  A  <  0  . 


(12.19) 


(12.20) 


(12.21) 


For  the  particular  choice  p  —  10,  q  —  24  and  m  —  224  one  finds  an  amazingly  large 
periodicity  of  order  <^(10171).  The  random  numbers  xn  are  integers  in  the  interval 
xn  G  [0 ,  m],  hence  dividing  the  random  numbers  by  m  gives  rn  e  [0,  1]. 
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12.3  Quality  Tests 

Here,  we  discuss  some  tests  to  check  whether  or  not  a  given,  finite  sequence  of 
numbers  xn  consists  of  uniformly  distributed  random  numbers  out  of  the  interval 
X„  G  [0,  l].1 


Statistical  Tests 


Statistical  tests  are  generally  the  most  simple  methods  to  arrive  at  a  first  idea  of  the 
quality  of  a  pseudo-random  number  generator.  Statistical  tests  are  typically  based 
on  the  calculation  of  moments  or  correlations.  Since  we  regard  the  simplified  case 
of  uniformly  distributed,  uncorrelated  random  numbers  within  the  interval  [0, 1],  the 
moments  can  be  calculated  immediately  from  (see  Appendix,  Sect.  E.2) 


(Xk)  —  J  d xxkp(x)  —  J  dxxk  — 


k  +  1 


(12.22) 


for  k  e  N.  These  moments  are  approximated  using  the  generated  finite  sequence  of 
numbers  {xn}n=h_N  via 


n=  1 


(12.23) 


As  illustrated  in  Appendix,  Sect.  E.2,  the  error  of  this  approximation  is  of  order 


(xk)  =  xk  +  e 


(12.24) 


Another  method  studies  correlations  (see  Appendix,  Sects.  E.2  and  E.10) 
between  the  random  numbers  of  the  sequence  and  compare  it  with  the  analytical 
result.  We  obtain  for  uncorrelated  random  numbers: 


(XnXn+k)  =  (Xn)2  =  1  . 


(12.25) 


^rom  now  on  we  define  quite  generally  the  interval  out  of  which  random  numbers  xn  are  drawn  by 
Xn  G  [0, 1]  keeping  in  mind  that  this  interval  depends  on  the  actual  method  applied.  This  method 
determines  whether  zero  or  one  is  contained  in  the  interval. 
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Fig.  12.1  Spectral  test  for  a  linear  congruential  generator.  We  used  the  Park-Miller  parameters, 

(a)  a  =  75 ,  c  =  0,  and  m  =  231  —  1,  (b)  a  =  137,  c  =  0,  and  m  =  211,  and  plotted  N  =  103 
subsequent  pairs  (xn  ,xn+\ )  of  random  numbers.  In  frame  (a)  the  random  numbers  evolve  nicely 
distributed  within  the  unit  square,  showing  no  obvious  correlations.  On  the  other  hand,  in  frame 

(b)  subsequent  random  numbers  lie  on  hyperplanes  and,  thus,  develop  correlations:  They  do  not 
fill  the  unit  square  uniformly 


Another,  quite  evident  test,  is  the  analysis  of  the  symmetry  of  the  distribution.  If 
Xn  e  [0, 1]  is  uniformly  distributed  then  it  follows  that  (1  —  Xn)  e  [0, 1]  should  also 
be  uniformly  distributed. 

Finally,  we  discuss  a  graphical  test,  known  as  the  spectral  test  [7].  The  spectral 
test  consists  of  plotting  subsequent  random  numbers  xn  vs  and  of  visual 
inspection  of  the  result.  One  expects  the  random  numbers  to  uniformly  fill  the  unit- 
square,  however,  if  correlations  exist,  particular  patterns  might  evolve.  We  illustrate 
this  method  in  Fig.  12.1  where  it  is  applied  to  a  linear  congruential  generator  (12.9) 
with  two  different  sets  of  parameters. 


Hypothesis  Testing 

Basically,  one  could  employ  different  hypothesis  tests,  such  as  the  KOLMOGOROV- 
Smirnov  test,  to  test  random  numbers.  These  tests  are  rather  basic  and  are 
discussed  in  numerous  books  on  statistics.  In  what  follows  we  shall  briefly  mention 
the  /2-test;  for  more  advanced  techniques  we  refer  the  reader  to  the  literature 
[13,  14]. 

The  /2-test  tests  the  pdf  directly.  One  starts  by  sorting  the  N  elements  of  the 
sequence  into  a  histogram.  Suppose  we  would  like  to  have  M  bins  and,  hence, 
the  width  of  every  bin  is  given  by  1/M.  We  now  count  the  number  of  elements 
which  lie  within  bin  k ,  i.e.  within  the  interval  [(k  —  1)/M,  k/M],  and  denote  this 
number  by  n The  histogram  array  h  is  given  by  h  —  c(n\,ri2, . . . ,  um)t  where 
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Fig.  12.2  Histograms  for 
N  =  10s,  IV  =  106  and 
N  =  107,  M  =  100  bins  as 
obtained  with  the 
Park-Miller  linear 
congruential  generator, 
a  =  75 ,  c  =  0  and 
m  =  231  —  1 
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the  constant  c  —  M/N  normalizes  the  histogram.  In  Fig.  12.2  we  show  three 
different  histograms  for  N  —  105,  2V  =  106  and  TV  =  107  uniformly  distributed 
random  numbers  as  obtained  with  the  Park-Miller  linear  congruential  generator. 
In  Fig.  12.3  we  present  a  histogram  for  N  —  107  obtained  with  the  bad  linear 
congruential  generator  defined  in  Fig.  12.1b.  We  recognize  numerous  empty  bins 
which  are  a  clear  indication  that  the  random  numbers  are  not  uniformly  distributed. 

Let  us  briefly  remember  some  points  from  probability  theory  [15,  16].  One  can 
show,  that  if  numbers  Qn  are  normally  distributed  random  variables,  their  sum 

x  =  J2  qI  - 

n=  1 


(12.26) 
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Fig.  12.3  Histogram  for  N  =  107  and  M  =  100  bins  as  obtained  with  a  linear  congruential 
generator  with  parameters  a  =  137,  c  =  0  and  m  =  211 


follows  a  x2' -distribution  where  v  is  the  number  of  degrees  of  freedom.  The  pdf  of 
the  x2 -distribution  is  given  by 


p (x;  v)  = 


V  _ 1  _ X 

X2  le  2 


2 


v  >  0  , 


(12.27) 


where  F(-)  denotes  the  E-function.  The  probability  of  finding  the  variable  x  within 
the  interval  [a,  b\  C  M+  can  be  calculated  as 


-f 

J  a 


P(x  e  [a,  b];v)  =  /  d xp(x;  v)  , 


(12.28) 


and  in  particular  for  a  —  0  we  obtain 


fb 

P{x<b\v)—  /  dxp(x;  v)  =  F(jb;  v) 

Jo 


(12.29) 


Here  we  introduced  the  cdf  F(b;  v).  Let  us  consider  the  inverse  problem:  the 
probability  that  x  <  b  is  equal  to  a,  i.e.  F(b;v)  —  a.  We  then  calculate  the  upper 
bound  b  by  inverting  Eq.  (12.29)  and  obtain: 


b  =  F~l(ce;v).  (12.30) 

These  values  are  tabulated  [17,  18]. 

We  return  to  our  particular  example:  the  hypothesis  is  that  the  sequence  {xn} 
generated  by  some  pseudo-random  number  generator  complies  to  a  uniform  distri¬ 
bution.  It  is  a  consequence  of  the  central  limit  theorem  (see  Appendix,  Sect.  E.8)  that 
the  deviations  from  the  theoretically  expected  values  nf  obey  a  normal  distribution. 
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We  define  the  variable 


v  = 


E 


( nk  -  nf  )2 

,Ah 


(12.31) 


If  our  hypothesis  is  true,  /2  follows  a  /2 -distribution  with  v  —  M  —  1  because  the 
requirement  that  the  sum  of  all  numbers  n k  is  equal  to  N  reduces  the  degrees  of 
freedom  by  one.  We  employ  relation  (12.30)  for  a  —  0.85  and  v  =  99  and  obtain 
b  —  113.  Hence,  values  /2  <  b  are  very  likely  if  / 2  really  follows  a  / 2  distribution, 
while  values  /2  >  b  are  unlikely  and  therefore  the  hypothesis  may  require  a  review. 
However,  it  has  to  be  emphasized  that  it  is  fundamentally  impossible  to  verify 
a  hypothesis.  It  can  only  be  falsified  or  strengthened.  We  note  that  the  resulting 
value  of  x2  will  highly  depend  on  the  seed  number  of  the  generator  as  long  as  the 
maximum  period  has  not  been  reached. 


Summary 

We  first  concentrated  on  a  possible  definition  of  randomness  and  on  a  mathematical 
definition  of  random  numbers  and  sequences.  As  the  generation  of  random  numbers 
was  the  main  topic  of  this  section  we  moved  on  to  describe  the  requirements  an 
‘ideal’  random  number  generator  will  have  to  obey.  On  the  other  hand,  the  numerics 
of  computational  physics  demanded  reproducible  sequences  of  random  numbers 
and  this  resulted  in  the  notion  of  ‘pseudo’  random  numbers  which  will  be  generated 
by  deterministic  methods  and,  thus,  cannot  possibly  be  ‘ideal’.  A  number  of  rather 
simple  but  quite  effective  pseudo-random  number  generators  was  discussed  before 
the  question  of  how  to  test  the  quality  (randomness)  of  these  numbers  was  raised. 
We  discussed  statistical  tests  and  demonstrated  the  simple  spectral  test  using  a  linear 
congruential  generator.  More  sophisticated  is  the  method  of  quality  testing.  The 
histogram  technique  as  a  direct  test  for  the  probability  density  function  from  which 
the  random  numbers  are  drawn  was  discussed  in  detail.  Finally,  some  basics  of  the 
/2-test  have  been  presented. 


Problems 

1 .  Write  the  computer  code  for  a  linear  congruential  generator.  This  generator  is 
described  by 


Xj+\  —  ( axj  +  c )  mod  m  . 
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The  random  numbers  rj  e  [0, 1]  can  be  obtained  by  normalizing  xj  as  was 

discussed  in  Sect.  12.2.  Use  the  following  parameters 

a.  a  —  16807  ,  c  —  0  ,  m  —  231  —  1  ,  xo  =  3141549, 

b.  a  =  5  ,  c  =  0  ,  m  —  27  ,  xo  =  1. 

2.  Perform  the  following  analysis: 

a.  Compute  the  mean  (r)  and  the  variance  var  (r)  for  random  numbers  generated 
in  N  steps.  Plot  the  result. 

b.  Plot  two  successive  random  numbers  versus  r^+i  for  k  —  1,2, ...  ,N  —  l  in 
a  two  dimensional  plot. 

c.  Repeat  the  above  steps  for  random  numbers  generated  by  your  system’s 
software. 

d.  Discuss  the  results ! 
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Chapter  13 

Random  Sampling  Methods 


13.1  Introduction 

Most  applications  require  random  number  generators  that  follow  a  particular 
probability  density  function  (pdf)  which  is  not  a  uniform  distribution  on  the  interval 
[0,  1].  We  present  in  this  chapter  methods  to  generate  random  numbers  that  follow 
some  arbitrary  pdf.  As  a  source  will  serve  uniformly  distributed  random  numbers  as 
they  are  generated  with  the  help  of  the  methods  we  discussed  in  Chap.  12. 

The  two  most  prominent  techniques  to  generate  random  numbers  from  an 
arbitrary  distribution,  are  the  inverse  transformation  method  and  the  rejection 
method .  They  will  be  discussed  in  Sects.  13.2  and  13.3,  respectively.  In  addition, 
we  comment  in  Sect.  13.4  briefly  on  sampling  from  piecewise  defined  pdfs  and 
combined  pdfs.  It  has  to  be  emphasized  that  these  methods  are  in  many  cases  not 
sufficient  and  a  more  powerful  approach  is  required.  One  of  these  is  based  on  the 
idea  of  importance  sampling  and  is  referred  to  as  the  METROPOLIS  method.  It  will 
be  discussed  briefly  in  Chap.  14. 

Nevertheless,  it  is  also  possible  to  obtain  quite  easily  random  numbers  for  some 
specific  pdfs  by  direct  sampling  [1].  For  instance,  suppose  x\,  x^  are  two  uniformly 
distributed  random  numbers.  Hence,  their  pdf  is  given  by 


Pu(x)  = 


1  *  €  [0,  1]  , 


(13.1) 


0  elsewhere. 


and  the  corresponding  cumulative  distribution  function  (cdf)  follows: 


0  v  <  0  , 

v  x  G  [0, 1]  , 


(13.2) 


1  x  >  l  . 


V 
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One  can  prove  that  the  new  random  number  y 

y  —  max(xi,V2)  ,  (13.3) 


conforms  to  the  cdf 


F(y)  =  y2 , 


and  to  the  pdf 1 : 


f(y)  =  2y  • 


(13.4) 


(13.5) 


The  consequence  is  an  elegant  method  to  generate  random  numbers  z  which  follow 
the  pdf 


g(z)  =  kzk  1  , 


(13.6) 


by  defining 


z  =  max(xi,X2, . . .  ,Xk)  .  (13.7) 

Here,  the  random  numbers  jq  are  uniformly  distributed  and  can  be  obtained  with  the 
help  of  the  methods  introduced  in  Chap.  12. 

Another  equally  elegant  method  can  be  employed  to  calculate  random  numbers 
which  follow  a  normal  distribution: 


pfe)  =  -i=exp(-L).  (13.8) 

Again,  we  act  on  the  assumption  that  the  random  numbers  xi  are  uniformly 
distributed  within  the  unit  interval  [0,  1].  We  take  two  random  numbers  (jci  ,  X2)  and 
construct  two  random  numbers  (z\ ,  Zi)  using  the  transformation: 

Z\  —  cos(27TV2)  yj — 2  In x\ ,  zi  —  sin (271x2)  1 n^i  •  (13.9) 


!This  follows  from  the  transformation  of  pdfs  (see  Chap.  14): 


m 


1 

dv2  5[v  —  max(xi ,  X2)]  =  2  y. 
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It  is  easy  to  demonstrate  that  (zi,Z2)  follow  the  pdf  (13.8).  We  introduce  the 
joint  distribution  pu(x  \,x2)  =  pu(x\)pu(x2)  (assumption  of  no  correlations).  The 
transformation  of  probabilities  [2-4]  gives 


p{z\, zi)dzidz2  =  pu(xux2)dxidx2  , 


(13.10) 


or,  equivalently,  the  JACOBIAN  determinant 


p(Zl ,  Zl)  = 


9fa,x2) 

d(zi,z2) 


(13.11) 


where  we  employed  Eq.  (13.1).  We  recognize  that  Eq.  (13.9)  is  equivalent  to 


x\  —  exp 


*1+4 


1 


X2  —  —  tan  1  (  — 
2n  V^i 


(13.12) 


The  Jacobian  determinant  is  readily  evaluated  and  gives- : 


3  (xi,  x2) 
d(zi,Z2) 


dzi  dz2 
dx2  dx2 
dzi  dz2 


-Z \X\ 

Z2 


-Z2X\ 

z\ 


2jz(z2l+z?p  2jt(z[+zI) 


_  Xl_ 

2  7T 

1 

=  ^-exp 
2n 


zj  +  zl 


=  p(z\)p(z.i) 


(13.13) 


This  is  the  product  of  two  normal  distributions  and,  thus,  z\  and  Z2  follow  indeed  a 
normal  distribution. 


2 We  make  use  of: 


—  tan  1  (x) 
dx 


1 

1  +  x2 
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13.2  Inverse  Transformation  Method 

The  inverse  transformation  method  is  one  of  the  simplest  and  most  useful  methods 
to  sample  random  variables  from  an  arbitrary  pdf  [1,  5-7].  Let p(x),x  e  [xm[n,  vmax], 
denote  the  pdf  from  which  we  want  to  obtain  our  random  numbers.  The  correspond¬ 
ing  cdf  will  be  denoted  by 


dxp(x)  . 


(13.14) 


It  follows  immediately  from  the  positivity  and  the  normalization  condition  of  pdfs 
(Appendix,  Sect.  E.5)  that  P(x)  is  monotonically  increasing  and,  furthermore,  that 
^(vmin)  =  0  and  POtmax)  =  1.  Let  £  denote  some  random  number  uniformly 
distributed  within  the  interval  [0,  1].  We  obtain  from  the  conservation  of  probability 
[2-4] 


Pu(%)  d£  =  p(x)  dx  =>•  1  =  pu(%)  =  p(x) 


-1 


5 


(13.15) 


which  can  be  solved  by  the  choice  £  =  P(x ),  since 


d 

dx 


P(x)  =  p(x)  . 


Hence,  we  arrive  at 


*  =  P~l& 


(13.16) 


(13.17) 


where  P~[  denotes  the  inverse  of  P.  It  is  an  obvious  caveat  of  this  method 
that  it  requires  the  inverse  P-1(£)  to  exist  and  that  P(x)  must  be  calculated  and 
inverted  analytically.  This  is,  for  instance,  not  possible  in  the  case  of  the  normal 
distribution  (13.8). 

Let  us  illustrate  this  method  with  two  simple  examples: 

1.  Suppose  we  want  to  draw  random  numbers  which  are  uniformly  distributed 
within  the  interval  [a,  b\.  The  corresponding  pdf  reads 


(13.18) 


and  the  cdf  takes  on  the  form 


b  —  a 


(13.19) 
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where  we  set,  in  this  particular  example,  xm[n  —  a.  Hence,  we  have 

x  —  a 

H  =  I - .  (13.20) 

b  —  a 

which  is  uniformly  distributed  within  [0, 1].  Consequently,  we  determine  random 
numbers  x  E  [a,  b]  uniformly  distributed  via 

x  —  a  -\-  (b  —  a )£  .  (13.21) 

2.  We  are  interested  in  random  numbers  v  drawn  from  a  pdf  given  by  the  exponential 
distribution: 


(13.22) 


Here,  A  >  0  and  v  e  [0,  oo).  These  could,  for  instance,  describe  the  free  path 
v  of  a  particle  between  interactions,  where  the  mean  free  path  (x)  =  A.  From 
Eq.  (13.17)  we  obtain 


1  —  exp 


and  consequently  the  relation 


(13.23) 


v  =  —A  ln(l  —  £)  ,  (13.24) 

gives  random  variables  v  which  comply  to  the  exponential  distribution  (13.22)  if 
£  follows  the  pdf  pu(£)  of  Eq.  (13.1).  Moreover,  it  follows  from  the  symmetry  of 
the  uniform  distribution  that 


x  =  —A  ln(£)  ,  (13.25) 

without  affecting  the  resulting  random  numbers.  In  Fig.  13.1  we  show  a 
histogram  with  random  numbers  drawn  according  to  (13.25). 

We  pointed  out  already  that  it  is  certainly  a  caveat  of  this  method  that  the  cdf 
P(x)  has  to  be  calculated  and  inverted  analytically.  Even  if  P(x)  is  not  analytically 
invertible,  it  is  possible  to  employ  the  inverse  transformation  method  by  calculating 
P(x)  for  certain  grid-points  V;  and  then  interpolating  P(x)  piecewise  between 
these  points  with  the  help  of  an  invertible  function.  However,  in  many  cases  it  is 
advantageous  to  employ  the  rejection  method,  which  will  be  discussed  next. 
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Fig.  13.1  The  histogram  representation  of  the  pdf  p{x )  vs  x  generated  by  random  numbers  drawn 
from  an  exponential  distribution  Eq.  (13.22)  with  the  help  of  the  inverse  transformation  sampling 
method.  Two  different  values  for  A  have  been  considered,  namely  (a)  A  =  1  and  (b)  A  =  5. 
N  =  105  random  numbers  have  been  sampled.  The  solid  line  corresponds  to  the  pdf  p(x )  according 
to  Eq.  (13.22) 


13.3  Rejection  Method 

The  rejection  method  is  particularly  suitable  if  the  inverse  transformation  method 
fails  [1,  6,  7].  One  of  the  most  prominent  versions  of  the  rejection  method  is  the 
Metropolis  algorithm.  It  will  be  introduced  in  Sect.  14.3. 

The  basic  idea  of  the  rejection  method  is  to  draw  random  numbers  v  from  another, 
preferably  analytically  invertible  pdf  h(x)  and  check  whether  or  not  they  lie  within 
the  desired  pdf  p(x).  If  this  is  the  case  the  random  number  v  is  accepted,  otherwise 
it  will  be  rejected.  This  is  also  the  basic  idea  of  the  hit  and  miss  version  of  Monte- 
Carlo  integration  which  will  be  discussed  in  Sect.  14.2. 

We  specify  the  rejection  method:  Let  p(x )  denote  the  pdf  from  which  we  want 
to  draw  random  numbers.  Furthermore,  let  h(x)  be  another  pdf,  which  can  easily 
be  sampled  (for  instance  with  the  help  of  the  inverse  transformation  method)  and 
which  is  chosen  in  a  such  a  way  that  the  inequality 

p(x)  <  ch(x)  ,  (13.26) 

holds  for  all  v  e  [vmin,vmax],  where  c  >  1  is  some  constant.  The  function  ch(x) 
is  referred  to  as  the  envelope  of  p(x)  within  the  interval  [vmm,  vmax].  The  strategy 
is  clear:  we  sample  a  random  variable  xl  (trial  state)  from  h(x )  and  accept  it  with 
probability  p(x)/[c  h(x)].  This  procedure  is  sketched  in  Fig.  13.2.  Let p(A\x)  denote 
the  probability  that  a  given  value  v  is  accepted  and  g(v)  denotes  the  probability  that 
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Fig.  13.2  Schematic 
illustration  of  the  rejection 
method.  The  trial  state  xl  is 
accepted  with  probability 
p(x)/[ch(x)\ 


we  produce  a  variable  v  with  the  help  of  this  algorithm.  Furthermore,  P(x  =  x*) 
stands  for  the  probability  that  a  trial  state  xl  is  generated.  We  have 


g(v)  oc  P(x  —  xt)p(A\xt) 
—  h(xl) 


„  p(x') 


ch(x 0 


oc  p(xf)  . 


(13.27) 


Hence,  we  indeed  generate  random  numbers  which  follow  the  pdf  p(x).  We  may 
also  calculate  the  probability  F(A)  that  an  arbitrary  trial  state  xl  is  accepted.  This  is 
done  with  the  help  of  the  marginalization  rule  (E.39): 


d  x*p(A  A  xl) 


d/p(A|jcOF(v  =  xl) 


—  -  J  d^pix1) 

_  1 

c 


(13.28) 


More  generally,  the  probability  F(A)  to  accept  a  J-dimensional  random  variable  is 
given  by: 


(13.29) 


204 


13  Random  Sampling  Methods 


We  deduce  that  the  bigger  c  the  worse  is  the  acceptance  probability  of  the  rejection 
method.  It  is  therefore  advisable  to  choose  the  envelope  h(x)  very  carefully. 

As  an  example  we  aim  at  sampling  the  normal  distribution  (E.43)  for  v  G  R 

pW=7ibexp(“^)  •  <13-30) 

with  expectation  value  (x)  =  Vo  =  0  and  variance  a2 .  In  a  first  step  we  restrict  our 
investigation  to  x  G  [0,  oo)  due  to  the  symmetry  of  the  pdf.  The  slightly  modified 
pdf  for  the  right-half  axis  reads 


v  G  [0,  oo)  , 


(13.31) 


where  we  adjusted  the  normalization.  The  complete  normal  distribution  (13.30)  is 
re-obtained  by  sampling  the  sign  of  v  in  an  additional  step.  We  use  as  an  envelope 
h{x)  the  exponential  distribution  Eq.  (13.22).  Furthermore,  A  and  c  are  chosen  in 
such  a  way  that  the  acceptance  probability  (13.28)  has  a  maximum  under  the 
constraint  (13.26).  Since  this  is  equivalent  to  c  ->  min  we  have  to  solve  the 
optimization  problem 


c  > 


q(x) 

h(x) 


max  . 


The  resulting  cmin  is  then  given  by 


£'min  — 


<7  (A  opt) 
h  (v0pt ) 


and  vopt  is  the  yet  unknown  optimal  value  for  v.  We  obtain 


d  q(x) 
dx  h(x) 


2A2 

d 

9 

exp 

no1 

dx 

2A2 

(  x 

x 


2a2 


7 TG‘ 


X 


exp 


A  2a2 


1 

A 


(13.32) 


(13.33) 


(13.34) 


and,  therefore, 
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Consequently,  we  have 


Cmin  — 


(13.36) 


The  above  relation  gives  the  minimum  value  of  c  for  arbitrary  values  of  A.  However, 
since  h(x )  is  our  envelope,  we  can  choose  A  in  such  a  way,  that  cm jn  ->  min.  This  is 
achieved  in  a  second  step 


=  0.  (13.37) 


which  results  in  the  optimum  value  Aopt  =  cr .  This,  finally,  results  together  with 
Eq.  (13.36)  in: 


(13.38) 


The  algorithm  is  executed  in  the  following  steps: 

1.  Draw  a  uniformly  distributed  random  number  £  e  [0, 1]. 

2.  Calculated  =  —  Aoptln(£),  where  Aopt  =  a. 

3.  Draw  a  uniformly  distributed  random  number  re  [0,  1].  If  r  <  g(d)/[cmin/z(d)], 
then  x  =  d  is  accepted  and  if  r  >  q(xt) /  [cmin/t(d)],  d  is  rejected  and  we  return 
to  step  1 . 

4.  If  d  was  accepted,  we  draw  a  uniformly  distributed  random  number  r  e  [0,  1] 
and  only  if  r  <  0.5  we  set  x  =  — x  otherwise  x  stays  as  is. 

5.  We  repeat  steps  1-4  until  the  number  N  of  desired  random  numbers  has  been 
reached. 

Figure  13.3  shows  random  numbers  obtained  with  the  help  of  this  method  in  a 
histogram  representation.  We  calculated  (a)  Af  =  103,  (b)  N  —  104,  and  (c)  N  —  105 
random  numbers  for  a  =  1.  It  is  quite  obvious  that  the  original  pdf  (13.30)  is  the 
better  approximated  the  bigger  the  number  N  of  sampled  random  numbers  becomes. 
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Fig.  13.3  The  histogram 
representation  of  the  pdf  p(x ) 
vs  x  generated  by  random 
numbers  drawn  from  the 
normal  distribution 
Eq.  (13.30)  (a  =  1)  with  the 
help  of  the  rejection  method. 
We  sampled  (a)  N  =  103,  (b) 
N  =  104,  and  (c)  N  =  105 
random  numbers.  The  solid 
line  represents  the  pdf 
p(x)  (13.30) 


x 


13.4  Probability  Mixing 

The  method  of  probability  mixing  was  developed  to  offer  an  algorithm  which  allows 
to  generate  random  numbers  by  sampling  piecewise  defined  or  composite  pdfs.  Such 
a  pdf  is  of  the  general  form 


N 

p(x)  =  £«/,(*).  “i  7^  0  . 
1=1 


where  the  sub-pdfs  fi(x)  fulfill  the  normalization  requirement 


(13.39) 


(13.40) 
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and  are  non-negative,  i.e. 


Mx)  >  0  ,  (13.41) 

for  all  i  =  1 , . . . ,  N.  It  follows  that 

N 

^>,  =  1,  (13.42) 

1=1 


which  ensures  that 


J  dx'p(x')  =  1  ,  (13.43) 

is  fulfilled.  The  question  is  how  to  sample  random  numbers  from  such  a  pdf,  since 
in  most  cases  it  might  be  hard  to  invert  the  sum  (inverse  transformation  method) 
or  find  a  suitable  envelope  (rejection  method).  However,  this  question  can  easily  be 
answered:  We  define 


i 

qi  =  J2at-  (13.44) 

t=  1 

Thus,  qN  —  1  and  the  interval  [0, 1]  has  been  divided  according  to: 


0  qqq  qqqq  l 

1  m2  m4  m5m6  H 

The  index  i  of  the  relevant  pdf  is  determined  by  the  condition 

<37-1  <  r  <  q{  ,  (13.45) 

where  r  e  [0,  1]  is  a  uniformly  distributed  random  number.  We  then  draw  the 
required  random  number  v  from  the  sub-pdf/ (v)  with  any  of  the  methods  discussed 
above. 

This  procedure  is  quite  plausible,  since  the  coefficients  07  give  the  relative  weight 
of  the  sub-pdfs/ (v).  In  particular,  oti  determines  the  importance  of  the  sub-pdf/ (v). 
It  is,  therefore,  a  natural  approach  to  use  07  as  a  measure  of  the  probability  that  a 
random  variable  is  to  be  sampled  from  the  particular  sub-pdf/ (v). 


208 


13  Random  Sampling  Methods 


Summary 

To  generate  random  numbers  is  essential  for  many  application  in  Computational 
Physics.  This  chapter  concentrated  on  basic  methods  to  generate  the  desired  random 
numbers:  (a)  the  direct  sampling  method  used  transformations  of  the  uniform 
distribution  to  generate  the  required  random  numbers;  (b)  the  inverse  transformation 
method  was  based  on  the  availability  of  an  inverse  cdf  which  was  in  most  cases 
required  to  be  calculated  analytically;  finally,  (c)  the  rejection  method  which 
was  basically  a  hit  or  miss  method.  It  used  an  easily  invertible  pdf  h(x)  which 
enveloped  the  desired  pdf  p(x )  completely  within  some  interval  v  e  [xm[n,  vmax].  The 
effectiveness  of  this  method  depended  on  how  ‘well’  the  envelope  h(x)  enclosed 
the  original  pdf  p(x).  In  a  last  step  the  method  of  probability  mixing  was  discussed. 
It  was  an  easily  verifiable  method  which  allowed  to  sample  random  numbers  from 
composite  pdfs. 


Problems 

Draw  random  numbers  from  the  following  pdfs: 

1.  Direct  Sampling: 

Sample  the  normal  distribution  with  (x)  —  0  and  o  —  1  with  the  help  of  the 
method  discussed  in  Sect.  13.1.  Check  the  result  by  plotting  the  random  numbers 
against  the  pdf  p(x)  in  a  histogram. 

2.  Inverse  Transformation  Method : 

Write  a  function  which  samples  random  numbers  from  the  exponential 
distribution  with  the  help  of  the  inverse  transformation  method  as  discussed  in 
Sect.  13.2.  Compare  the  generated  random  numbers  to  the  pdf  in  a  histogram. 

3.  Rejection  Method: 

Sample  the  normal  distribution  with  (x)  =  0  and  a  =  1  with  the  help  of  the 
exponential  distribution  as  discussed  in  Sect.  13.3.  Compare  the  generated  ran¬ 
dom  numbers  with  the  pdf  in  a  histogram.  Determine  the  acceptance  probability 
numerically. 

4.  Probability  Mixing: 

We  choose  an  alternative  envelope  for  the  normal  distribution  with  (x)  =  0 
and  cr  =  l.  This  envelope  is  chosen  to  be  constant  for  all  \x\  <  xq,  and  decays 
exponentially  for  \x\  >  x$.  (vo  is  a  parameter  of  your  choice.)  The  parameters 
do  not  need  to  optimize  the  acceptance  probability.  Again,  plot  the  generated 
random  numbers  in  a  histogram  and  compare  the  acceptance  probability  with  the 
acceptance  probability  of  point  3. 


References 


209 


References 


1.  Devroye,  L.:  Non-uniform  Random  Variate  Generation.  Springer,  Berlin/Heidelberg  (1986) 

2.  Chow,  Y.S.,  Teicher,  H.:  Probability  Theory,  3rd  edn.  Springer  Texts  in  Statistics.  Springer, 
Berlin/Heidelberg  (1997) 

3.  Kienke,  A.:  Probability  Theory.  Universitext.  Springer,  Berlin/Heidelberg  (2008) 

4.  Stroock,  D.W.:  Probability  Theory.  Cambridge  University  Press,  Cambridge  (2011) 

5.  Bratley,  P,  Fox,  B.L.,  Schrage,  L.E.:  A  Guide  to  Simulation.  Springer,  Berlin/Heidelberg  (1987) 

6.  Knuth,  D.:  The  Art  of  Computer  Programming,  vol.  II,  3rd  edn.  Addison  Wesley,  Menlo  Park 
(1998) 

7.  Press,  W.H.,  Teukolsky,  S.A.,  Vetterling,  W.T.,  Flannery,  B.P:  Numerical  Recipes  in  C++,  2nd 
edn.  Cambridge  University  Press,  Cambridge  (2002) 


Chapter  14 

A  Brief  Introduction  to  Monte-Carlo  Methods 


14.1  Introduction 

This  chapter  presents  a  brief  introduction  to  Monte-Carlo  methods  in  general,  and 
to  Monte-Carlo  integration  as  well  as  to  the  Metropolis -Has tings  algorithm  in 
particular.  A  detailed  discussion  of  the  fundamental  concepts  involved  is  postponed 
to  Chap.  16.  The  introduction  given  here  is  not  supposed  to  be  self-contained  and 
methods  will  be  introduced  without  reference  to  their  background. 

The  notion  of  Monte-Carlo  methods,  Monte-Carlo  algorithms  or  Monte-Carlo 
techniques  is  not  well  defined.  In  particular,  the  term  Monte-Carlo  summarizes  a 
wide  field  of  methods  which  are  based  on  the  sampling  of  random  numbers  [1- 
3] .  In  general,  the  advantage  of  Monte-Carlo  algorithms  lies  in  their  computational 
strength.  In  many  cases  it  is  simply  not  feasible  to  employ  deterministic  methods  due 
to  their  very  high  computational  cost.  However,  in  many  cases  the  use  of  methods 
based  on  random  sampling  is  also  motivated  by  the  nature  of  the  processes  to  be 
described.  We  mentioned  in  the  previous  chapter  as  a  typical  example  the  radioactive 
decay  of  some  nucleus.  This  process  is  believed  to  be  purely  stochastic  in  nature. 

The  development  of  Monte-Carlo  techniques  was  initialized  in  the  1940s  by  J. 
von  Neumann,  S.  M.  Ulam  and  N.  Metropolis  who  coined  the  term  Monte- 
Carlo  methods .  One  of  the  earliest  illustrations  of  the  principle  of  Monte-Carlo 
techniques  in  general,  and  of  Monte-Carlo  integration  in  particular  is  the  Monte- 
Carlo  approximation  of  n.  The  discussion  which  follows  now  includes  the  essential 
ideas  of  Monte-Carlo  integration. 

We  regard  the  unit  square  characterized  by  the  corner  points  (0,0),  (0,  1),  (1,0), 
and  (1 , 1).  The  area  A s  of  this  square  is  one.  We  insert  a  quarter-circle  of  radius  r  =  1 
which,  consequently,  possesses  the  area  Ac  —  n/4.  Suppose,  we  are  throwing  darts 
on  this  unit  square  in  such  a  way  that  the  impact  points  are  uniformly  distributed; 
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then  the  probability  P  that  a  certain  dart  becomes  stuck  within  the  interior  of  the 
quarter-circle  is  given  by 

P  =  —  =  Ac  =  -  =  0.785398  ....  (14.1) 

As  4 

From  a  probabilistic  point  of  view,  we  have  after  N  throws  of  which  n  hit  the 
interior  of  the  quarter-circle  the  probability: 

P=  lim  -  .  (14.2) 

AM-oo  N 

The  strategy  is  clear:  we  draw  uniformly  distributed  random  numbers  yt  from 
the  interval  [0,  1].  These  are  the  intersection  points  of  the  darts.  Repeating  this 
experiment  several  times  and  counting  the  number  of  hits  n  within  the  quarter-circle 
allows  us  to  approximate  n  via 


7 r  n 

P  —  —  %  —  . 
4  N 


(14.3) 


The  resulting  approximation  of  n  will  be  strongly  influenced  by  the  number 
of  experiments  N  as  well  as  by  the  performance  of  the  random  number  generator 
used.  Table  14.1  lists  computed  approximations  of  n  for  different  numbers  of 
experiments  N  as  they  were  obtained  with  the  help  of  a  linear  congruential  generator. 
Linear  congruential  generators  have  been  introduced  and  discussed  in  Sect.  12.2. 
The  parameters  used  to  initialize  the  generators  are  given  in  the  caption  of  the  table. 
Furthermore,  Fig.  14.1  illustrates  the  result  after  N  —  103  experiments  for  both 
generators. 


Table  14.1  Approximate  values  obtained  with  the  method  discussed  in  the  text.  The  linear 
congruential  generators  are  initialized  by  the  following  parameters:  generator  (1):  a  =  l5 ,  c  =  0, 
m  =  231  —  1,  and  xo  =281  (Park-Miller)  and  generator  (2):  a  =  75 ,  c  =  0,  m  =  211,  and 

xq  =  281.  We  also  give  the  absolute  errors  |  n®  —  n  \ 


N 

^(1) 

rCa 

TCa  rC 

^(2) 

rCa 

^(2)  TT 

TCq  TT 

10 

2.8000 

0.34159 

2.8000 

0.34159 

102 

2.9200 

0.22159 

3.1600 

0.01841 

103 

3.1600 

0.01841 

3.1840 

0.04241 

104 

3.1304 

0.01119 

3.1868 

0.04521 

105 

3.1358 

0.00579 

3.1875 

0.04589 

106 

3.1393 

0.00229 

3.1875 

0.04599 

107 

3.1413 

0.00028 

3.1875 

0.04599 
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x  x 

Fig.  14.1  N  =  103  uniformly  distributed  random  numbers  within  the  unit-square.  Frame  (a)  gives 
the  results  for  generator  (1)  while  frame  (b)  is  for  generator  (2).  The  number  of  elements  within 

the  quarter-circle  indicated  by  the  solid  line  determines  the  value  of  n® .  The  inferior  result  of  the 
approximation  obtained  with  generator  (2)  [frame  (b)]  originates  in  correlations  between  the  x  and 
y  coordinates 


14.2  Monte- Carlo  Integration 


We  generalize  the  ideas  formulated  above  and  consider  a  function  f(x)  >  0  for 
x  G  [a,  b]  C  R  where  the  area  of  interest  is 


(14.4) 


We  denote 


?  =  max  /(x)  , 

xE[a,b\ 


and  obtain  using  the  above  example 


n 

A  —  As  lim  —  , 

S  N — >oo  N 


(14.5) 


(14.6) 


where  n  is  the  number  of  random  points  under  the  curve  indicated  schematically  in 
Fig.  14.2.  The  area  As  is  given  by 


As  =  (b  —  a)%  ,  (14.7) 

and  the  random  numbers  rt  —  (v/,y/)  are  uniformly  distributed  within  the  intervals 
Xi  e  [a,  b]  and  y/  e  [0,  ^].  This  method  is  referred  to  as  hit  and  miss  integration  [4]. 
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Fig.  14.2  Schematic 
illustration  of  the 
Monte-Carlo  integration 
technique 


Another  way  to  perform  a  Monte-Carlo  integration  is  the  so  called  mean-value 
integration .  It  is  essentially  based  on  the  mean  value  theorem  of  calculus  which  we 
already  employed  in  our  discussion  of  quadrature  in  Chap.  3.  We  restate  it  here  for 
the  sake  of  a  more  transparent  presentation:  The  mean- value  theorem  states  that  if 
f(x)  is  a  continuous  function  for  x  E  [a,  b]  then  there  exists  a  z  C  (a,  b)  such  that 

b 

d xf(x)  =f(z){b  -  a)  .  (14.8) 

The  function  value /(z)  =  (/}  is  referred  to  as  the  expectation  value  or  mean  value 
of  f(x).  We  know  from  probability  theory  [5-7]  that  the  expectation  value  can  be 
approximated  by  the  arithmetic  mean  / 


1 


b  —  a 


f 


Ax'f(x')  ~/± 


(14.9) 


with  the  error  given  by  the  standard  error,  Eq.  (E.14).  The  arithmetic  mean/,  on  the 
other  hand,  is  given  by 


f  =  jj'E/iXi)  >  (14-10) 

i=  1 

and  consequently 

-  1  N 

f2  =  -  E/2(x4  ■  (14-n> 

/=i 

Note  that  here  the  variables  Xi  are  assumed  to  be  uniformly  distributed  random 
numbers  within  the  interval  [a,  b\.  (This  result  will  immediately  be  discussed  in 


14.2  Monte-Carlo  Integration 


215 


more  detail.)  However,  first  of  all  we  note  from  the  law  of  large  numbers,  Eq.  (E.25), 
that  this  approach  is  exact  in  the  limit  N  ->  oo: 


1 


b  —  a 


j 


b 


d xf(x) 


1 

lim  — 

N-^oo  N 


N 


(14.12) 


Let  us  now  consider  the  more  general  case  which,  in  the  end,  will  guide  us  to 
a  very  prominent  formulation  of  Monte-Carlo  integration.  We  want  to  estimate  the 
expectation  value 


d xf(x)p(x)  , 


(14.13) 


where  x  eRd  and  p(x )  is  a  pdf.  A  typical  example  is  the  calculation  of  the  thermal 
expectation  value  in  statistical  physics  where  the  pdf  p(x)  is  given  by  the  normalized 
Boltzmann  distribution 


(14.14) 


Here  E(x)  denotes  the  energy  as  a  function  of  the  parameter  v  e  stands 

for  Boltzmann’s  constant,  T  is  the  temperature,  and  the  normalization  factor  Z  is 
referred  to  as  the  canonical  partition  function  [8-1 1]. 

Equation  (14.13)  may  be  rewritten  as 


-I 


if)  =  /  d xf(x)p(x)  =  /  dffq(f)  , 


-J 


where  we  introduced  the  probability  density  q(f)  off  via 


-f 


q(f)  =  /  dx8[f  -f(x)\p{x)  , 


(14.15) 


(14.16) 


with  £(•)  Dirac’s  ^-distribution.  Let  us  briefly  explain  how  we  arrived  at  this 
definition.  Let  the  cdf  P(x)  be  defined  by1 


P(x )  =  Pr(X  <  x)  =  [  d xp(x)  .  (14.17) 

7  —  00 


1  Please  note  that  according  to  the  conventions  established  in  Appendix  E  capital  letters  denote 
random  variables. 
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We  define  in  analogy  the  cdf  Q(f): 


G(/)  =  Pr(F</)=Pr[/(X)</] 


(14.18) 


Note  that  we  distinguish  between  the  function/(X)  of  the  random  variable  X  [which 
follows  the  pdf  p(X )]  and  the  particular  function  value/  G  R.  Furthermore,  the 
probability  Pr  [f(X)  <  f]  can  be  rewritten  as 


Pr  [f{X)  </]  =  E  P v(an  <  X  <  bn)  , 


(14.19) 


n 


where  the  values  an  <  bn  are  the  ordered  intersection  points  a\  <  b\  <  <22  <  £>2  < 
. . .  <  cln  <  b^  chosen  in  such  a  way  that 


f(an)  —  f  (bn)  =/  ,  and  f[x  e  ( an,bn )]  </ 


(14.20) 


It  is  a  matter  of  the  particular  form  of  f(x)  whether  or  not  the  boundary  points  have 
to  be  included.  Equation  (14.19)  can  be  rewritten: 


Pr (an  <X<bn)=  P(bn )  -  P(an)  =  /  d xp(x)  . 


pbT 

J  Cln 


(14.21) 


The  pdf  q(f)  is  related  to  the  cdf  Q(f)  via 


<?(/)  =  -tt2(/)  > 

d/ 


(14.22) 


and  we  obtain 


<?(/)  =  E  17  Pr^n  -  X-bn) 

„  d f 


=  E-  / 

r d/  4 


by 


&xp(x) 


=  E 


d£> 


da 


,,pOn)  -  -j7P(an) 

L  d  f  d/ 


=  E 


_  /  d/(x)  \ 


-1 


d.v  / 


p(*) 


%=<2n  -J 


(14.23) 
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However,  we  know  from  Eq.  (14.20)  that: 


d/(X> 

(lx 


x=bn 


>  0 


and 


d \f(x) 
(lx 


x=a 


n 


<  0  . 


(14.24) 


We  introduce  the  intersection  points  Xk  where  x\  <  x^  <  . . .  <  xk  and  K  —  IN  (if 
the  boundary  points  are  not  included)  for  which  f(x\)  =  f(x 2)  =  . . .  =  f(xx)  =  f. 
Hence,  Eq.  (14.23)  may  be  rewritten  as 


<?(/)  = 


df(x) 

-l 

dx 

p(xk) 


X=Xk 


P(Xk) 
\f'(Xk)\  ' 


(14.25) 


We  want  to  improve  this  result  and  remember  that  the  Dirac  8 -distribution  of  an 
arbitrary  function  g(y)  can  be  expressed  as  [12] 


<%(y)]  - 

i 


8(y-yi) 

\g'(yd\ 


(14.26) 


where  the  y,  are  the  zeros  of  g(y),  i.e.  g(yi)  —  0.  Hence,  we  arrive  at  the  final  form 
of  Eq.  (14. 23)2: 


~f(x)]p(x)  ■ 


(14.27) 


We  note,  furthermore,  that: 


j  d fq(f)  =  f  df  f  dx8  [f  -fix)}  p(x)  =  f  dxp(x)  =  1 


(14.28) 


2We  give  an  example.  Suppose/ft)  =  exp  (a).  Then  we  deduce  that 


8  [f  ~  exp(x)] 


S(x  -  In/) 

/ 


9 


and,  consequently, 


A  second  example  was  given  in  Chap.  13  where  we  derived  the  pdf  (13.5). 
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As  a  result,  the  variance  of/,  var  (/),  can  be  expressed  as 

var  (/)  =  J  d x[f(x)  -  (f)]2 p(x)  =  J  df  [f  -  (f)]2  q(f)  . 

Let  us  define  in  a  next  step  the  arithmetic  mean  of  f(X ) 

N  N 

*  =  s  X><>  =  * • 

i=  1  i=  1 

calculated  with  the  help  of  A  random  numbers.  Hence,  we  have 


(14.29) 


(14.30) 


(^}  =  </)  ,  (14.31) 

and 

var(/) 

var(i^)  =  — — —  ,  (14.32) 

according  to  Appendix  E.  It  follows  from  the  central  limit  theorem,  Appendix 
Sect.  E.8,  that  for  large  values  of  A,  the  pdf  of  p( J^)  converges  to  a  normal 

distribution  (E.43)  with  (J^)  and  var  (^): 


Based  on  this  property  (/)  can  be  estimated  from: 


</>  =  ^  ± 


var  (/) 
N 


1  W 

=  v  ^/ta)  ± 

1=1 


var  (/) 
A 


(14.33) 


(14.34) 


Here,  the  random  numbers  xt  are  sampled  from  the  pdf  p(x).  This  method  is  the 
most  prominent  formulation  of  Monte-Carlo  integration. 

We  shall  briefly  discuss  some  properties  of  this  method.  We  deduce  from 
Eq.  (14.34)  that  the  error  scales  like  A“2.  In  contrast  to  the  integration  methods 
we  discussed  in  Chap.  3,  A  is  no  longer  the  number  of  grid-points  but  the  number 
of  random  numbers  sampled.  In  principle,  the  error  scaling  is  worse  than  in  the 
case  of  classical  integrators.  For  instance,  in  the  case  of  the  central  rectangular  rule 
(Sect.  3.2)  we  had  an  error  scaling  of  A-2  when  summed  over  the  whole  interval. 
However,  we  obtained  this  result  for  the  one-dimensional  case,  in  higher  dimensions 
we  will  certainly  need  much  more  grid-points.  On  the  other  hand,  in  Eq.  (14.34)  A 


^Nevertheless,  there  is  certainly  some  conceptual  similarity  between  grid-points  and  random 
numbers  within  this  context. 
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corresponds  to  the  number  of  d-dimensional  random  numbers  v.  Hence,  Monte- 
Carlo  integration  can  be  of  advantage  whenever  one  has  to  deal  with  complicated, 
high  dimensional  integrals.  In  contrast,  restricted  to  one  dimension  it  is  in  most 
cases  not  an  improvement  of  the  methods  discussed  already. 

Monte-Carlo  integration  can  also  be  of  advantage  whenever  the  integrand/(v)  is 
not  well  behaved.  In  such  a  case  a  very  fine  grid  would  be  required  to  compute  a 
reasonable  estimate  of  the  true  value  of  the  integral.  Monte-Carlo  integration  offers 
a  very  convenient  alternative  due  to  its  conceptual  simplicity  [13]. 

It  is  certainly  a  drawback  of  Monte-Carlo  integration  in  its  formulation  (14.34), 
that  the  error  is  also  proportional  to  y/ var  (/)  which  is  a  yet  unknown  quantity. 
One  has  to  approximate  it  with  an  adequate  estimator,  for  instance  with  the  help  of 
the  sampling  variance.  Moreover,  if  the  variance  var  (/)  diverges,  the  central  limit 
theorem  does  not  hold  and  the  procedure  (14.34)  is  no  longer  justified  and  will  fail 
for  sure. 

Closely  related  to  the  problem  of  how  to  determine  var  (/),  is  the  question  of 
how  many  random  numbers  should  be  drawn.  In  most  cases  an  iterative  approach 
is  the  most  promising  strategy.  In  a  first  step  N  random  numbers  are  drawn  and  the 
integral  is  computed  using  Eq.  (14.34).  Then  another  set  of  N  random  numbers  is 
sampled  and  Eq.  (14.34)  is  reevaluated  now  using  all  2N  random  numbers.  If  the 
change  in  the  resulting  estimate  of  the  integral  is  less  than  some  given  tolerance  6, 
the  loop  is  terminated  otherwise  another  set  of  N  random  numbers  is  added. 

We  mention  that  this  form  of  Monte-Carlo  integration  can  be  improved  partic¬ 
ularly  by  sampling  only  from  points  which  dominantly  contribute  to  the  integral. 
This  method  is  referred  to  as  importance  sampling  [13-16]  and  will  be  discussed  in 
more  detail  later  on. 
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The  Metropolis  algorithm  is  a  more  sophisticated  method  to  produce  random 
numbers  from  given  distributions.  In  fact,  the  METROPOLIS  algorithm  is  a  special 
form  of  the  rejection  method  (Sect.  13.3).  This  section  introduces  the  algorithm  on 
a  very  basic  level  which  will,  in  the  end,  allow  a  first  glance  at  an  interesting  model 
out  of  statistical  physics,  namely  the  ISING  model.  It  will  be  discussed  in  Chap.  15 
and  a  more  detailed  discussion  of  the  Metropolis  algorithm  will  be  postponed  to 
Sect.  16.4. 

The  Metropolis  algorithm  is  particularly  useful  to  treat  problems  in  statistical 
physics  where  thermodynamic  expectation  values  of  some  observable  O  are  of 
interest  [8-11].  They  are  defined  as 


d xO(x)q(x)  , 


(14.35) 
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where  v  is  a  set  of  parameters  and  q(x)  is  the  Boltzmann  distribution  (14.14). 
The  set  of  parameters  v  could  be,  for  instance,  the  position-  and  momentum- space 
coordinates  of  N  different  particles.  In  most  cases  v  is  a  high  dimensional  object 
which  makes  classical  numerical  integration  (Chap.  3)  cumbersome.  Instead  Monte- 
Carlo  integration  is  employed  and  the  integral  (14.35)  is  approximated  with  the  help 
of  Eq.  (14.34)  by 


var  (p) 
~~N 


(14.36) 


where  the  uncorrelated  random  numbers  x^i  —  1,2,...,#  are  sampled  from  the 
pdf,  Eq.  (14. 14).  We  recognize  immediately  the  problem:  we  need  to  know  the  exact 
functional  form  of  q(x)  if  we  want  to  apply  either  the  inverse  transformation  method 
or  the  rejection  method  discussed  in  Chap.  13.  However,  the  partition  function  Z 
itself  is  determined  by  an  integral  which  can  be  approximated  using  Eq.  (14.36).  We 
set 


and 


(14.37) 


(14.38) 


follows  from  the  normalization  of  q(x).  The  METROPOLIS  algorithm  was 
designed  to  avoid  precisely  this  problem.  We  concentrate  on  a  pdf  which  is  of 
the  form  (14.37),  but  q(x)  must  not  necessarily  be  described  by  a  normalized 
Boltzmann  distribution,  Eq.  (14.14).  Thus,  p(x)  is  arbitrary  but  it  ensures  that 


(14.39) 


and  q(x)  >  0  for  all  v.  In  other  words,  q(x)  is  a  pdf.  Suppose  we  already  have  a 
sequence  xo,x\, . . . ,  xn  —  {xn}  of  parameters  which  indeed  follows  the  pdf  q(x).4 
We  now  add  to  the  last  element  of  this  sequence  xn  a  small  perturbation  8  and  set 


xt  =  xn  +  8  . 


(14.40) 


Note  that  the  perturbation  8  is  of  the  same  dimension  as  the  vector  x.  Similar  to  the 
rejection  method  we  seek  for  a  criterion  which  helps  us  to  decide  whether  or  not  the 
test  value  xt  can  be  accepted  as  the  next  element  of  the  sequence  {xn}. 
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The  question  of  how  one  can  obtain  such  a  sequence  will  be  discussed  in  Sect.  16.3. 
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The  Metropolis  method  proposes  an  acceptance  probability  of  the  form 


Pr(A|x,,x„) 


1 

qixt) 

„  qiXn) 


if 


qix,) 
q(x„ ) 


otherwise. 


(14.41) 


Hence,  if  Pr(A\xt,xn)  —  1,  we  set  =  xu  and  if  ¥r(A\xt,xn)  <  1,  we  draw  a 
random  number  r  e  [0,  1]  and  accept  xt  if  r  <  Pr(A\xt,xn)  and  reject  xt  otherwise. 
We  note  that  in  this  formulation  the  knowledge  of  the  normalization  factor  Z  is  no 
longer  required  since  it  follows  from  Eq.  (14.37)  that 


qixt)  _  p(xt) 
q(x„)  p(xn)  ' 

Consequently  we  rewrite  Eq.  (14.41)  as 


(14.42) 


Pr(,4|x,,.v„)  =  min 


/  Piped 
\p(xn )  ’ 


p(xt\x„)  , 


(14.43) 


where  we  introduced  in  the  last  step  a  more  compact  notation. 

A  discussion  of  the  underlying  concepts  and  why  the  choice  (14.41)  indeed 
samples  random  numbers  according  to  the  pdf  q{x)  requires  some  basic  knowledge 
of  stochastics  in  general  and  of  MARKOV-chains  in  particular.  This  is  the  reason 
why  we  postponed  this  discussion  to  Chap.  16.  Nevertheless,  there  is  a  particular 
property,  referred  to  as  detailed  balance  which  requires  our  attention  because  it 
is  crucial  for  the  METROPOLIS  algorithm:  Let  p(xt\xn)  denote  the  pdf  for  the 
probability  that  a  random  number  xt  is  generated  from  the  random  number  xn  as 
defined  in  Eq.  (14.43).  Then  the  condition  of  detailed  balance  is  defined  as 


p(x,\xn)q(xn)  =  p(xn\x,)q(x,)  . 


(14.44) 


In  words:  The  probability  p(xt\xn)  that  a  random  number  xt  is  generated  from  a 
random  number  xn  times  the  probability  q(xn )  that  the  random  number  xn  occurred 
at  all  is  equal  to  the  probability  p(xn\xt)  that  the  random  number  xn  is  generated 
from  xt  times  the  probability  q(xt)  that  xt  occurred.  Detailed  balance  is  motivated  by 
physics  and  is  a  condition  of  thermodynamic  equilibrium. 

Let  us  briefly  demonstrate  that  the  Metropolis  algorithm  (14.43)  satisfies 
detailed  balance:  We  distinguish  three  different  cases:  (i)  Suppose  that  p(xt \xn)  — 
p(xn \xt)  —  1.  From  Eq.  (14.43)  we  note  that  this  is  only  possible  if  p(xt )  =  p(xn ) 
and  therefore  q(xt)  =  q(xn)  which  is  already  Eq.  (14.44)  for  this  particular  case,  (ii) 
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We  assume  that  p(xt \xn)  —  1  but  p(xn  \xt)  ^  1.  It  follows  from  Eq.  (14.43)  that 


p(xn\xt)q(xt)  = 


(14.45) 


This  corresponds  to  Eq.  (14.44)  for  p(xt\xn)  —  1.  Note  that  we  made  use  of 
definition  (14.37)  in  order  to  achieve  this  result,  (iii)  Finally,  we  find  for p(xn  \xt)  —  1 
dindp{xt\xn)  ^  1  that 


p(xt\xn)q(xn) 


(14.46) 


which,  again,  is  Eq.  (14.44).  Hence,  the  METROPOLIS  algorithm  (14.43)  indeed 
obeys  detailed  balance. 

So  far  the  question  of  how  to  choose  the  initialization  point  xo  of  the  sequence 
stayed  unanswered.  This  is  clearly  not  a  trivial  problem  and  it  is  strongly  related 
to  one  of  the  major  disadvantages  of  the  Metropolis  algorithm,  namely  that 
subsequent  random  numbers  (. xn,xn+\ )  are  strongly  correlated.  One  of  the  most 
pragmatic  approaches  is  to  choose  a  starting  point  xo  at  random  out  of  the  parameter 
space  and  then  discard  it  together  with  the  first  few  members  of  the  sequence. 
This  approach  is  strongly  motivated  by  a  clear  physical  picture:  The  sequence  of 
random  numbers  resembles  the  evolution  of  the  physical  system  from  an  arbitrary 
initial  point  xo  toward  equilibrium  which  manifests  itself  in  the  condition  of  detailed 
balance.  Hence,  the  approach  of  discarding  the  first  few  members  of  the  sequence  is 
referred  to  as  thermalization. 

The  integral  of  interest,  Eq.  (14.35)  is  then  approximated  with  the  help  of 
Eq.  (14.36),  where  the  random  numbers  X£,X£+i, . . .  ,Xk+N  are  used,  if  the  ther¬ 
malization  required  k  steps.  There  is  a  remedy  which  helps  to  reduce  correlations 
between  subsequent  random  numbers  within  the  sequence  which  is  based  on  a 
similar  strategy.  In  particular,  the  modified  sequence 

Xk ?  ’  ■T&T2-£  ?  •  •  •  5  (14.47) 

generated  by  discarding  l  intermediate  random  numbers  will  reduce  correlations 
between  the  members  of  this  final  sequence  of  random  numbers. 


Summary 

This  chapter  set  the  stage  for  an  important  numerical  tool  in  Computational  Physics: 
the  Monte-Carlo  techniques.  It  started  with  the  conceptual  transparent  task  of  how 
to  calculate  tt  using  a  sequence  of  uniformly  distributed  random  numbers  of  the 
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range  [0, 1].  This  established  the  so-called  hit  and  miss  technique.  It  moved  on  to  a 
discussion  of  Monte-Carlo  integration  in  a  more  formal  way  and  discussed  in  detail 
the  error  involved  by  this  type  of  integration  as  opposed  to  the  error  experienced 
by  deterministic  methods.  The  conclusion  was,  that  Monte-Carlo  integration  was 
certainly  preferable  whenever  estimates  of  high  dimensional  integrals  were  required 
and  it  also  had  advantages  when  the  integrand  was  heavily  structured.  The  second 
part  of  this  chapter  dealt  with  the  Metropolis  algorithm  which  allowed  to  generate 
a  sequence  of  random  numbers  from  some  pdf  p(x).  It  was  conceptually  similar 
to  the  rejection  method  discussed  earlier.  The  mathematical  background  which  is 
more  involved  was  not  discussed  within  this  first  contact  with  the  METROPOLIS 
algorithm.  Instead,  the  emphasis  was  to  demonstrate  that  this  algorithm  obeyed 
detailed  balance  a  property  purely  based  on  physics  as  a  condition  of  thermo¬ 
dynamic  equilibrium.  It  was,  furthermore,  pointed  out  that  the  random  numbers 
generated  by  this  algorithm  were  highly  correlated  and  some  strategies  to  remedy 
this  problem  were  discussed. 
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Chapter  15 
The  ISING  Model 


15.1  The  Model 

Ferromagnetic  materials  are  materials  which  develop  a  non- vanishing  magnetization 
M  even  in  the  absence  of  an  external  magnetic  field  B.  It  is  an  experimental  obser¬ 
vation,  that  this  magnetization  decreases  smoothly  with  increasing  temperature,  and 
vanishes  above  the  critical  temperature  Tc,  referred  to  as  Curie  temperature  [1]. 
Above  this  temperature  the  magnetization  is  zero  and  the  material  is  no  longer 
ferromagnetic  but  paramagnetic.  This  typical  situation  is  illustrated  in  Fig.  15.1  and 
it  is  the  signature  of  a  phase  transition .  In  a  theoretical  description  of  this  transition 
the  magnetization  M  serves  as  an  order  parameter.  At  T  —  Tc  the  system  exhibits 
a  second  order  phase  transition:  The  magnetization  is  not  differentiable  with  respect 
to  T;  it  is,  however,  continuous. 

The  microscopic  origin  of  this  macroscopic  phenomenon  is  based  on  the 
exchange  interaction  between  identical  particles,  the  atoms  or  molecules  forming 
the  material.  The  exchange  interaction  is  a  purely  quantum-mechanical  effect  which 
is  a  consequence  of  the  COULOMB  interaction  in  combination  with  the  PAULI 
exclusion  principle.  For  more  detailed  information  please  consult  Refs.  [2-8]. 

Given  two  atoms  or  molecules  with  spins  S\  and  S2,  where  S\,S2  G  M1 2 3,  the 
exchange  interaction  energy  is  of  the  form 


E  —  JSi  •  S 2  , 


(15.1) 


1  For  a  short  introduction  to  phase  transitions  in  general  please  consult  Appendix  F. 

2 The  statement  that  magnetism  is  a  purely  quantum-mechanical  phenomenon  that  cannot  explained 
in  classical  terms  is  known  as  the  Bohr- VAN  Leeuwen  theorem  [3,  4]. 

3  In  this  discussion  we  regard  the  spin  as  a  classical  quantity.  In  the  quantum  mechanic  case  one 
has  to  replace  the  vectors  by  vector  operators  St. 
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Fig.  15.1  Schematic 
illustration  of  the 
magnetization  M  as  a 
function  of  temperature  T  in  a 
ferromagnetic  material 


Fig.  15.2  Schematic 
illustration  of  the 
spin-orientation  in  a  (a) 
ferromagnetic  (/  <  0)  or  (b) 
antiferromagnetic  (J  >  0) 
two-dimensional  crystal 
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with  the  exchange  constant  J.  The  magnitude  of  J  as  well  as  its  sign  are  determined 
by  overlap  integrals  which  include  the  COULOMB  interaction.  If  J  <  0  a  parallel 
orientation  of  the  spins  is  energetically  favorable  and  ferromagnetism  arises  if  T  < 
Tc.  On  the  other  hand,  if  /  >  0,  an  antiparallel  orientation  is  established  as  long  as 
the  temperature  does  not  exceed  the  Neel  temperature  TN.  However,  in  both  cases 
the  system  undergoes  a  phase  transition  to  a  paramagnetic  state  if  the  temperature 
T  exceeds  the  Curie  temperature  (ferromagnetic  case)  or  the  Neel  temperature 
(antiferromagnetic  case).  A  schematic  illustration  of  ferro-  and  antiferromagnetism 
for  a  two-dimensional  crystal  is  illustrated  in  Fig.  15.2.  We  summarize  the  different 
scenarios: 


J  <  0  ferromagnetic, 

<  J  >  0  antiferromagnetic, 
/  =  0  non-interacting. 


We  concentrate  on  a  cubic  crystal  lattice  in  which  the  atoms  are  localized  at 
positions  xt.  The  spin  of  atom  l  will  be  denoted  by  St  e  M3  and  the  exchange 
parameter  between  atom  t  and  atom  t’  by  Jtt /.  Furthermore,  we  consider  the 
ferromagnetic  case  with  Jtt'  <  0.  The  HAMILTON  function  [9-11]  is  of  the  form 


1  1 

h  =  -  2^  Jtt'St  -St?  =  -  2^ h-i'St  •  s^  . 


(15.2) 


u> 


u> 


Here  Jtt '  was  replaced  by  Jt-t'  —  Jt'-i  to  account  for  translational  invariance. 
Moreover,  we  define  that  Jtt  —  0,  otherwise  we  would  have  to  exclude  the 
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contributions  1  =  1'  from  the  above  sum.  The  HAMILTON  function  (15.2)  is  genuine 
to  the  Heisenberg  model  [1].  We  note  that  in  this  model  there  is  no  distinguished 
direction  of  spin  orientation  and,  consequently,  the  Hamilton  function  is  invariant 
under  a  rotation  of  all  spin  vectors  St .  The  actual  spin  orientation  may  be  determined 
by  an  external  magnetic  field  or  by  an  anisotropy  of  the  crystal  lattice.  Furthermore, 
the  restriction  of  the  spin  orientation  to  the  positive  or  negative  z-direction  is  the 
characteristic  of  the  ISING  model. 

In  a  quantum  mechanical  description  the  HAMILTON  operator  (Hamiltonian)  of 
the  ISING  model  is  defined  by 


H  ,  (15.3) 

2  IV 

where  S\  are  the  spin  operators  in  z-direction.  If  spin  1/2  particles  are  described  by 
this  Hamiltonian,  the  spin  operators  S\  are  replaced  by  (/z/2)cr/  with  o\  the  Pauli 
matrix  and  fi  the  reduced  Planck’s  constant.  Furthermore,  we  redefine  = 
-(fit / J'_t,  >  0,  and  represent  the  Hamiltonian  in  the  basis  of  eigenstates 
of  the  operators  a/.  These  eigenstates  have  eigenvalues  07  =  ±1  which  correspond 
to  spin  up  and  spin  down  states,  respectively.  We  obtain  in  this  representation 

H  =  ~\  -  h  (J,:  ,  (15.4) 

z  W  l 

where  we  dropped  the  prime  on  the  exchange  parameter  Jt-v  for  the  sake  of  a  more 
compact  notation.  We  added,  furthermore,  a  term  which  accounts  for  the  possible 
coupling  of  the  spins  to  an  external  magnetic  field,4  where  h  stands  for  the  reduced 
field  h  =  —fiEgB/ 2. 5 

There  are  some  special  cases  in  which  the  ISING  model  can  be  solved  analytically 
[12,  13].  For  instance,  one  can  solve  the  general  case  described  by  Eq.  (15.4)  with 
the  help  of  the  mean  field  approximation:  The  contribution  hi  acting  on  site  l 

hi  =  h+  y  ji-iroy  ,  (15.5) 

v 


is  replaced  by  its  mean  value 


(hi)  =  h-\-  Jm  , 


(15.6) 


4 We  note  in  passing  that  the  Hamiltonian  (15.4)  is  invariant  under  a  spin  flip  of  all  spins  if  h  =  0 
(Z2  symmetry).  This  symmetry  is  broken  if  h  7^  0,  i.e.  the  spins  align  with  the  external  field  h. 

5  We  note  that  H  oc  p-B  where  B  is  the  magnetic  field  and  /x  is  the  magnetic  moment.  Furthermore, 
/x  can  be  expressed  as  /x  =  —/iBgS/h  =  —fiBg(j/2,  where  /x#  is  the  Bohr  magneton,  g  is  the 
Lande  g-factor  and  o  is  the  vector  of  Pauli  matrices.  The  sign  is  convention. 
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where  m  —  (<Ti)  and  J  —  Ji-  (The  term  Jm  is  commonly  referred  to  as  the 
molecular  field.)  With  the  help  of  this  ansatz  it  is,  for  instance,  possible  to  reproduce 
the  experimentally  observed  Curie-Weiss  -  law  of  ferromagnetic  materials:  The 
temperature  dependence  of  the  magnetic  susceptibility  /  for  T  >  Tc  can  be 
described  by: 


X  •  (15.7) 

Another  very  interesting  special  case  of  the  general  model  (15.4)  is  the  restriction 
to  nearest  neighbor  (n.  n.)  interaction  with  the  assumption  that  the  interaction 
between  non-nearest  neighbor  spins  is  negligible.  One  step  further  goes  the 
approximation  that  Ji-^  =  J  for  nearest  neighbors.  Hence,  we  have 

(  J  if  1,1'  n.  n.  , 

Ji-u  =  (15.8) 

(  0  otherwise. 

In  this  case  Eq.  (15.4)  is  rewritten  as 

H  =  -J-  -h^ae  ,  (15.9) 

Z  (if/)  l 

where  J2(u')  denotes  the  sum  over  all  nearest  neighbors.  This  model  can  be  solved 
analytically  in  one  and  two  dimensions  if  the  system  is  assumed  to  be  spatially 
unlimited.  The  solution  in  one  dimension  was  published  by  E.  IsiNG  [14].  The 
solution  in  two  dimensions,  which  is  much  more  involved,  was  reported  by  L. 
Onsager  [15]. 

We  briefly  discuss  ISlNG’s  solution  in  one  dimension.  The  Hamiltonian  (15.9) 
for  A-particles  aligned  in  a  one-dimensional  chain  is  rewritten  as 

N  N 

H  =  -J  ffiOi+1  -  h  T~  Of;  ,  (15.10) 

l= l  i=\ 

where  we  applied  periodic  boundary  conditions,  oa+i  =  o\ ,  and  the  factor  1/2  was 
absorbed  into  J.  Let  us  now  briefly  elaborate  on  the  kind  of  observables  we  would 
like  to  describe  within  this  model.  (We  note  in  passing  that  the  following  discussion 
is  not  restricted  to  the  one-dimensional  case.)  Given  a  particular  spin  configuration 
^  —  {a,},  we  assume  that  the  probability  of  finding  the  system  in  this  configuration 
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is  given  by  the  BOLTZMANN  distribution  p(^)6: 


P(ft  = 


(15.11) 


Here,  T  is  the  temperature  and  kB  is  Boltzmann’s  constant.  The  energy  E(Z?) 
associated  with  configuration  Z?  is  given  by  Eq.  (15.10).  Please  note  that  now, 
obviously,  we  have  to  treat  the  model  in  the  classical  sense,  although  we  consider 
spin  degrees  of  freedom.  The  partition  function  ZN  is  given  by  the  sum  over  all 
possible  configurations  Z?  [3,  4,  16]: 


(15.12) 


In  general,  the  task  of  solving  the  Ising  problem  is  a  problem  of  how  to  evaluate  the 
sum  (15.12).  This  is  certainly  not  trivial  since,  for  instance,  in  the  one  dimensional 
case  with  N  =  100  grid-points  one  has  2N  —  2100  ^  1.3  x  1030  different 
configurations  Z?.  On  the  other  hand,  once  ZN  has  been  determined  more  information 
about  the  properties  of  the  system  can  be  derived  [2,  12,  13].  For  instance,  the 
expectation  value  of  the  energy  is  given  by 

(E)  =  =  kBT2Z  in  zN  , 

and  the  expectation  value  of  the  magnetization  follows  from 

(M)  =  =  kB T^r  In ZN  , 


(15.13) 


(15.14) 


where  we  defined  the  magnetization  ^ZZ{Z?)  of  a  configuration  Z?  via: 


JH&)  = 


(15.15) 


The  term  ai  was  placed  within  parenthesis  indexed  by  Z?  to  emphasize 
its  dependence  on  the  particular  configuration  Z? .  From  the  observables  (15.13) 
and  (15.14)  the  fluctuation  quantities ,  namely,  the  magnetic  susceptibility,  /,  and 


6In  particular  we  assume  ergodicity  of  the  system  as  will  be  explained  in  Chap.  16. 
7  (E)  is  also  referred  to  as  internal  energy  U. 
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the  heat  capacity,  c/*,  can  be  derived.  The  following  relations  hold: 

d  d 

X  =  -^1  (M)  and  ch  =  —  (E)  . 

Equation  (15.13)  is  applied  to  rewrite  the  expression  for  the  heat  capacity: 


(15.16) 


d 


Oz 


(15.17) 


Here  we  made  use  of  the  fact  that  E(^o)  is  independent  of  temperature  T.  We 
evaluate,  furthermore,  the  derivative  of  pi^o)  with  respect  to  temperature  T : 


%(r>=  9 


dT 


dT 


exp 


kBT  I 


'N 


ppO 

kBT2 


\E(V)  -  <£}]  . 


(15.18) 


This  is  inserted  into  Eq.  (15.17)  and  results  in  a  final  expression  for  the  heat  capacity: 


(-'h 


=  ^-T2  y>(7f)  [E2 (E)  -  E(W)  {£)] 

/Cr 

a  & 

=  pp  (<£J>  - 


kBT2 

1 

hr2 


var  ( E ) 


(15.19) 


This  result  justifies  why  the  heat  capacity  is  referred  to  as  a  fluctuation  quantity. 

We  determine  now,  following  the  same  ideas,  the  magnetic  susceptibility  using 
relation  (15.14): 


d 


x  =  yT#(^-p(7f) . 


(15.20) 


We  note  that 


9 

dh 


E  C*f)  =  , 


(15.21) 
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and  obtain: 


-  MW  -  (M)]  • 

kBT 


(15.22) 


This  results  in  a  final  expression  for  the  magnetic  susceptibility  /  which  relates  it  to 
the  variance  of  the  magnetization  M : 


1 

hT 


E/>c*o  [^2(^)  -  <Af>] 

& 


1 


((M2)-(M)2) 


1 


var(M)  . 


(15.23) 


After  this  excursion,  we  return  to  the  analytic  treatment  of  the  infinite  one¬ 
dimensional  ISING  model  with  nearest  neighbor  interaction,  Eq.  (15.10).  If  it  were 
possible  to  evaluate  the  partition  function  ZN,  the  required  observables  would  be 
directly  accessible  via  the  above  relations.  In  most  cases  this  task  is  not  analytically 
feasible.  Nevertheless,  in  our  particular  case  it  appears  to  be  possible  because  we 
recognize  that  we  can  actually  evaluate  Eq.  (15.12)  explicitly  by  keeping  in  mind 
Eq.  (15.9): 


zN  =  y>cr> 


=  Eexp 


<*? 


1  /  A  ,  A 

h?  ■'!>«+,  + 

.  \  l=\  1= 1 


Ol 


N 


=  EIlexp 


#  i=\ 


J  h  / 

GI&1+ 1  +  ^  W  +  &i+l) 


kBT 


2  kBT 


In  the  last  step  the  sum  over  oi  was  replaced  by  an  alternative  sum 


(15.24) 


N  ^  N 

E<*  =  oE^ +ct«+i)  - 


(15.25) 
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which  is  a  consequence  of  the  periodic  boundary  conditions  0^+1  =  oq.  Equa¬ 
tion  (15.24)  can  be  rewritten  as 


ZN  =  tr  N )  , 


(15.26) 


where  tr  (•)  denotes  the  trace  operation  and  we  introduced  the  transfer  matrix: 


%,a'  =  exp 


00'  +  _/*_  (tr  +  a') 


kBT 


2  kBT 


(15.27) 


Let  us  briefly  clarify  this  point:  The  trace  operation  in  the  basis  of  the  spin 
eigenvalues  a  —  ±  1  results  in 


tr  ( SZ )  =  =  T-x,-x+Tu  . 


(15.28) 


a 


Hence,  we  have 


tr  (.rq  =  £ 

o 

—  ^a,a\  ^o\,G2  ’  *  *  ^ON- 1  ,0- 

o-  {a/} 

/=1,...,V— 1 

=  %\  ,(72  ^2,0-3  ^0N,  0-1  *  (15.29) 

In  the  last  step  we  redefined  the  sum  indices  and  we  used  the  notation  {a/}  to  indicate 
that  the  sum  runs  over  all  possible  values  of  oq ,  oq , . . . ,  oN  in  order  to  abbreviate 
the  notation.  However,  the  sum  over  all  possible  values  of  oq,  oq , ,aN  can  be 
replaced  by  a  sum  over  all  configurations  ^  where  one  configuration  is  a  specific 
combination  of  definite  values  oq,  oq, . . . ,  aN.  For  these  definite  values  the  product 
of  transfer  matrices  in  Eq.  (15.29)  is  equivalent  to  the  product  of  exponentials  in 
Eq.  (15.24)  due  to  our  definition  of  the  transfer  matrix  .  Hence  we  demonstrated 

that  expression  (15.26)  is  indeed  equivalent  to  Eq.  (15.24). 

It  follows  from  definition  (15.27)  that 

exp(i^r)  exp(“i^)) 

exp(-¥f)  expfe)  )  ' 


&  = 


(15.30) 
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It  is  an  easy  task  to  determine  the  eigenvalues  of  this  matrix  [17,  18].  The 
characteristic  polynomial 


det 


exp(Sr)~A  exp(-^f) 
exp(-iff)  exp(iif)-; 


is  of  the  form 


J  +  h 

exp  i  — -  I  —  A 


kBT 


—  A2  —  2A  exp 


=  0, 


J~h\  i 

exp  i  - -  I  —  A 


J 


kBT 


cosh 


kBT 

f- 

\h>T 


+  2  sinh 


-  exp 

/  2  J 

\k^T 


2J 


kBT 


(15.31) 


(15.32) 


which  is  easily  solved.  We  get  for  the  two  eigenvalues  A  1,2 


A 12  =  exp  I  - —  I  cosh 


kB  T 


kBT 


±./exp 


2  J  \  .  9  /  h 

smh 


kB  T 


kB  T 


+  exp  - 


2  J 

k^Tj  9 


(15.33) 


and  note  that  X\  >  A  2  for  all  temperatures  T  >  0. 

We  now  make  use  of  the  fact  that  the  trace  is  invariant  under  a  basis  transforma¬ 
tion  r .  Hence  we  can  express  the  transfer  matrix  in  a  basis  in  which  it  is  diagonal 
and  set 


=  rsrr~x  = 


Ai  0 

0  A? 


(15.34) 


which  immediately  results  in: 


ZN  =  Af  +  A^ 


(15.35) 


Everything  required  to  calculate  the  expectation  value  of  energy  per  particle  (e) 

/  \  k*Tl  9 1  7 
(£)  =  — InZiv  , 


N  dT 


(15.36) 
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in  the  thermodynamic  limit  N  ->  oo  is  now  in  place  and,  thus,  we  can  investigate 
the  possibility  of  a  phase  transition.  First,  we  consider  the  limit 


lim  —ZN  —  lim  —  In  (Af  +  X?)  =  \nX\  , 

A->oo  N  N—>oo  N  V  1  27 


(15.37) 


since  X\  >  A  2  for  all  T  >  0.8 

If  there  is  no  external  field,  i.e.  h  —  0,  we  have 


1 

lim  —Zm  =  In 

N—>oo  N 


2  cosh 


( 


J 


\kBTjJ 


(15.38) 


which  is  a  smooth  function  of  T  for  T  >  0.  Consequently,  we  do  not  observe  a 
phase  transition  in  the  one  dimensional  ISING  model.  Even  more  information  about 
the  system  can  be  provided  by  the  spin  correlation  function  (atari') 


(oioi>)  =  TTC toy  ■  (15.39) 

A  basic,  however,  tedious  calculation  shows  that  in  the  thermodynamic  limit  it  is 
described  by 


(&iai') 


(15.40) 


with  the  result  that  the  spin  correlation  decreases  with  increasing  distance  l  —  lr 
since  X2  <  X\  for  T  >  0. 

We  move  on  and  briefly  sketch  the  solution  of  the  infinite  two-dimensional  ISING 
model  according  to  L.  Onsager  [15].  The  Hamilton  function  (15.10)  changes 
into: 


H  =  — 


j  (^  +  1^'  +  Gl-hl'  +  +  Gl,l'  + 1)  —  h  T 

W  1,1' 


(15.41) 


8  We  transform 


and  use  that 


as 


00  . 
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The  strategy  developed  for  the  one-dimensional  case  can  again  be  applied:  The 
system  is  treated  as  a  classical  system  with  spin  degrees  of  freedom.  The  Hamilton 
function  (15.41)  is  inserted  into  the  expression,  Eq.  (15.12)  for  the  partition  function 
ZN.  With  the  help  of  the  correct  basis  ZN  can  be  described  by  the  trace  over  a  product 
of  transfer  matrices.  However,  in  this  case  the  transfer  matrix  £Z  is  of  dimension 
IN  x  2N  rather  than  2  x  2.  It  is  quite  obvious  that  the  search  for  the  largest  eigenvalue 
for  arbitrary  values  of  N  is  not  a  trivial  task.  Therefore,  we  limit  our  discussion  to  a 
summary  of  the  most  important  results  for  the  particular  case  h  —  0. 

In  the  two-dimensional  case  a  phase  transition  is  indeed  observed:  The  magnetic 
susceptibility  becomes  singular  at  a  particular  temperature  7c.  This  temperature  is 
given  as  the  solution  of  equation: 


2tanh2  ( - ^  =  1  . 


I  kfiTcJ 


(15.42) 


The  expectation  value  of  the  energy  per  particle  takes  on  the  form 


(s)  —  —  /coth 


/  2  J 


\ ,  T , . 1  +  -Km 

\  kftl  J  I  i r 


2  tanh" 


2  J 
hT 


-  1 


(15.43) 


where  £)(£)  is  the  complete  elliptic  integral  of  the  first  kind  [see  Eq.  (1.14)]  with 
the  argument: 


2  sinh 


£  = 


(2L\ 

\kBT) 


cosh" 


(m\ 

\k*T) 


(15.44) 


The  magnetization  per  particle  (m)  is  proved  to  be  determined  from 


H  = 


(1  +  z2)  —  6z2  +  z4)» 


Vl  -  z2 


0 


for  T  <  Tc  , 


for  T  >  Tc  , 


(15.45) 


with 


z  =  exp 


Equation  (15.45)  clearly  describes  a  phase  transition  at  T  —  Tc. 
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15.2  Numerics 

We  study  a  finite  two-dimensional  ISING  model  on  a  square  lattice  £2  with  grid- 
points  (xi,yj),  ij  =  1,2 which  will  be  denoted  by  (ij).  We  write  the 
Hamilton  function  in  the  form 

h  £  ,  (15.46) 

ij 


H  —  J  (JijGi'j' 
ij 

i'f 


where  the  gq  G  {—1,1}  are  treated  as  ‘classical’  spins.  We  consider  nearest 
neighbor  interaction  and  regard  the  exchange  parameter  as  independent  of  the  actual 
positions  ij.  The  problem  is  easily  motivated:  We  calculate  numerically  observables 
like  the  expectation  value  of  the  energy  or  of  the  magnetization  which  will  then  be 
compared  with  analytic  results.  Such  a  procedure  provides  a  rather  simple  check 
of  the  quality  of  the  numerical  approach  which  can  then  be  extended  to  similar 
models  which  cannot  any  longer  be  treated  analytically.  We  need  numerical  methods 
because  summing  over  all  possible  configurations  in  a  calculation  of  the  partition 
function  ZN  is  simply  no  longer  feasible  since,  for  instance,  for  N  —  100  we  have 
2 N  —  2 10000  %  lO3000  possible  configurations  which  will  have  to  be  considered  as 
follows  from  Eqs.  (15.12),  (15.13),  and  (15.14).  A  more  convenient  approach  would 
be  to  approximate  the  sums  with  the  help  of  methods  we  encountered  within  the 
context  of  Monte-Carlo  integration  in  Sect.  14.2.  For  instance,  the  estimate  of  the 
energy  expectation  value  is  given  by 


(15.47) 


Here,  %,  i  =  1, 2, . . .  ,M  are  M  configurations  drawn  from  the  pdf  (15.11),  the 
Boltzmann  distribution.  Equation  (15.47)  is  referred  to  as  the  estimator  of  the 
internal  energy.  We  note  that  we  also  have  to  calculate  an  estimate  of  the  variance 
of  E  using  a  similar  approach  in  order  to  determine  the  error  induced  by  this 
approximation. 

Hence,  there  remains  the  task  to  find  configurations  %  which  follow  the 
Boltzmann  distribution  (15.11).  The  inverse  transformation  method  of  Sect.  13.2 
cannot  be  applied  since  E(%)  is  not  invertible.  Furthermore,  the  rejection  method 


9In  particular  var  ( E )  =  (E2)—(E)2  is  to  be  determined  and  only  the  second  term  is  already  known. 
The  first  term,  (E2),  is  then  estimated  with  the  help  of 
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is  useless  since  we  would  need  the  partition  function  ZN  to  make  it  work.  However, 
calculating  the  partition  function  is  a  task  as  difficult  as  calculating  the  internal 
energy  (15.13)  without  any  approximations.  Therefore,  the  method  of  choice  will 
be  the  Metropolis  algorithm  discussed  in  Sect.  14.3. 

Let  ^  be  a  given  spin  configuration  on  the  two-dimensional  square  lattice 
£2.  We  modify  the  spin  on  one  particular  grid-point  (ij)  and  obtain  a  trial  spin 
configuration  <£?t.  According  to  our  discussion  in  Sect.  14.3,  the  probability  of 
accepting  the  new  configuration  Z?1  is  then  given  by 


Pr(A|^',^)  =  min  1  j  = 


min  <  exp 


-  E( tf) 

k^T 


.1 


=  mm 


exp|“0M 


The  spin  orientation  was  changed  only  on  one  grid-point  (/,/),  with  gq 
— o' ij ;  thus,  the  energy  difference  AEtj  is  easily  evaluated  using 


(15.48) 


= 


AEij  —  2 JGij  +  &ij- 1  +  Gi,j+\)  +  2 hal 


ij 


(15.49) 


with  Gi  j  the  original  spin  orientation. 

We  focus  now  on  numerical  details,  some  particular  to  the  numerical  treatment  of 
the  Ising  model  [19],  and  some  of  rather  general  nature  which  should  be  considered 
whenever  a  Monte-Carlo  simulation  is  planned. 


(1)  Lattice  Geometry 

We  regard  a  two-dimensional  N  x  N  square  lattice  with  periodic  boundary  condi¬ 
tions  in  order  to  reduce  finite  volume  effects.  It  is  of  advantage  to  write  a  program 
code  which  will  help  to  identify  the  nearest  neighbors  of  some  grid-point,  since 
we  will  need  this  information  in  the  METROPOLIS  run  whenever  we  calculate  the 
energy  difference  due  to  a  spin  flip  according  to  Eq.  (15.49).  To  help  with  this  task 
a  matrix  neighbor(site,  i)  will  be  generated  only  once  for  each  choice  of  the  system 
size  N.  Here  i  —  1, 2,  3,  4  are  the  directions  to  the  neighboring  grid-points  of  the 
grid-point  site.  In  a  first  step  the  sites  of  the  square  lattice  are  relabeled  following 


10Periodic  boundary  conditions  in  two  dimensions  imply  that 

<Tv+i,/  =  <J\j  and  <Tw+  i  =  <T\i  , 


for  all  ij. 
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the  scheme1 1 : 


N(N  -  1)  +  1  •  •  •  Af2 

:  :  :  (15.50) 

TV  +  1  •  •  •  2  N 

1  2  •••  N  . 

In  the  next  step,  the  matrix  neighbor  is  initialized  as  an  array  of  size  N2  x  4. 
Every  site  has  four  nearest  neighbors:  up ,  right ,  down ,  and  left.  The  corresponding 
matrix  elements  for  periodic  boundary  conditions  can  be  evaluated  according  to  the 
following  scheme: 

•  For  up  we  have: 

(a)  If  site  N  <  N2:  up  —  site  +  N , 

(b)  else  if  site  N  >  N2:  up  —  site  —  N(N  —  1). 

•  For  right  we  have: 

(a)  If  mod(site,N)  ^  0:  right  =  site  +  1, 

(b)  else  if  mod  (site,  N )  =  0:  right  —  site  —N+\. 

•  For  down  we  have: 

(a)  If  site  —  N  >  1 :  down  —  site  —  N, 

(b)  else  if  site  —  N  <  1 :  down  =  site  +  N(N  —  1) 

•  For  left  we  have: 

(a)  If  mod  (site  —  1,  N)  ^  0:  left  =  site  —  1, 

(b)  else  if  mod  (site  —  1,  N)  =  0:  /<?/£  =  szte  +  N  —  1. 

In  a  final  step,  the  array  elements  are  rearranged  according  to 

neighbor(site,:)  =  [up,  right,  down,  left]  ,  (15.51) 

where  site  —  1,2, ...  ,N2. 


(2)  Initialization 

It  has  already  been  discussed  in  Sect.  14.3  that  the  quality  of  random  numbers 
generated  with  the  help  of  the  Metropolis  algorithm  is  highly  dependent  on  the 
choice  of  initial  conditions.  This  is,  in  our  case,  the  initial  spin  configuration  ^o- 


11  In  the  following  we  will  refer  to  the  notation  (i),  i  =  1, 2, . . . ,  N2  as  the  single-index  notation 
while  the  notation  (i,j),  ij  =  1,2, ...  ,N  will  be  referred  to  as  the  double-index  notation. 
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Of  course,  it  would  be  favorable  to  start  with  a  configuration  which  was  already 
drawn  from  the  Boltzmann  distribution  However,  in  practice  this  is  not 

feasible  ab  initio.  But,  as  will  be  elucidated  in  Chap.  16,  the  Metropolis  algorithm 
produces  configurations  which  become  independent  of  the  initial  state  and  follow 
the  Boltzmann  distribution.  Hence  we  can  simply  start  with  some  arbitrary 
configuration  and  discard  it  together  with  the  first  n  constituents  of  the  sequence 
%,  , . . . ,  This  method  is  referred  to  as  thermalization.  ~  The  question  arises: 

can  n  be  determined  to  ensure  that  the  sequence  starting  with  %,+\  will  conform  to 
the  pdf/? Of)? 

There  are  two  different  ways  to  approach  this  problem:  (i)  The  first  is  to  measure 
auto-correlations  between  configurations  %  where  it  has  to  be  ensured  that  the 
set  of  states  is  sufficiently  large  to  allow  for  a  significant  conclusion.  We  will 
discuss  auto-correlations  in  more  detail  in  Chap.  19.  (ii)  The  second  approach  is 
to  empirically  check  whether  equilibrium  has  been  reached  or  not.  For  instance, 
one  could  simply  plot  some  selected  observables  and  check  when  the  initial  bias 
vanishes.  In  this  case  the  observable  reaches  some  saturation  value  as  a  function  of 
the  number  of  measurements.  A  particularly  useful  method  is  to  start  the  algorithm 
with  at  least  two  different  configurations.  As  soon  as  equilibrium  has  been  reached, 
the  observables  should  approach  the  same  saturation  values  after  a  certain  (finite) 
number  of  measurements.  Typical  choices  are  the  cold  start  and  the  hot  start.  Cold 
start  means  that  the  temperature  is  initially  below  the  critical  temperature,  i.e.  in 
the  Ising  model  all  spins  are  aligned  (ferromagnetic  state).  Hot  start  means  that  the 
temperature  is  well  above  the  critical  temperature  and  for  the  Ising  model  the  spin 
orientation  is  chosen  at  random  for  any  site  (paramagnetic  state). 


(3)  Execution  of  the  Code 

The  Metropolis  algorithm  for  the  Ising  model  is  executed  in  the  following 

steps: 

1.  Choose  an  initial  configuration 

2.  We  migrate  through  the  lattice  sites  systematically.  Suppose  we  just  reached 
site  (ij)  (we  use  the  double-index  notation  ij  —  1, 2, . . . ,  A,  to  improve  the 
readability)  and  our  current  configuration  is  Then  k  configurations  have  been 
accepted  so  far.  We  generate  a  new  configuration  ^  from  %  by  replacing  in  % 
the  entry  oUj  by  -aUj. 


12 The  number  of  configurations  discarded  is  referred  to  as  the  thermalization  length. 
13  A  migration  through  all  lattice  sites  is  referred  to  as  a  sweep. 
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3.  The  new  configuration  is  accepted  with  probability 


Pr(A|^,^)  =  min 


exp 


(15.52) 


where  AEy  is  determined  from  Eq.  (15.49).  &  is  accepted  if  Pr(A|^,  ^)  is 
equal  to  one  or  if  Pr(A|^,  %)  >  r  e  [0,  1]  otherwise  ^  is  rejected.  If  ^  was 
accepted  we  set  %+i  =  ^ . 

4.  Go  to  the  next  lattice  site  [step  2]. 

We  note  that  instead  of  sampling  the  lattice  sites  sequentially  as  suggested  in  step 
2  the  lattice  sites  can  also  be  sampled  randomly  with  the  help  of 

i  —  int(rA2)  +  1  ,  (15.53) 


where  r  e  [0,  1]  is  a  uniformly  distributed  random  number  and  int(-)  denotes  the 
integer  part  of  a  given  quantity.  Obviously,  Eq.  (15.53)  is  only  useful  in  the  single¬ 
index  notation  i  —  1 , 2, . . . ,  N2. 


(4)  Measurement 

As  soon  as  thermalization  was  achieved  the  procedure  to  measure  interesting 
observables  can  be  started.  Such  a  procedure  consists  of  collecting  the  data  required 
and  in  calculating  expectation  values  as  was  illustrated  in  Eq.  (15.47)  for  the  case  of 
the  expectation  value  of  the  energy.  A  more  detailed  study  of  estimator  techniques  is 
postponed  to  Chap.  19.  However,  there  is  one  crucial  point  one  should  be  aware  of: 
We  already  mentioned  in  our  discussion  of  the  METROPOLIS  algorithm  in  Sect.  14.3 
that  subsequent  configurations  ^  may  be  strongly  correlated.  This  problem  can  be 
circumvented  by  simply  neglecting  intermediate  configurations.  For  instance,  one 
may  allow  a  couple  of  ‘empty’  sweeps  between  two  measurements. 

In  the  following  we  discuss  some  selected  results  obtained  with  the  numerical 
approach  described  above. 


15.3  Selected  Results 

We  investigate  the  two-dimensional  ISING  model  with  periodic  boundary  conditions 
and  we  chose  h  —  0  and  J  —  0.5  for  all  following  illustrations. 

In  a  first  experiment  we  plan  to  check  the  thermalization  process  and,  thus, 
measure  after  every  single  sampling  step  and  skip  thermalization.  The  observables 
of  interest,  the  expectation  value  of  the  energy  per  particle,  (e),  and  the  expectation 
value  of  the  magnetization  per  particle,  (m) ,  are  illustrated  in  Fig.  15.3  for  30  sweeps 
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Fig.  15.3  Time  evolution  of 
(a)  the  expectation  value  of 
the  energy  per  particle  ( s ) 
and  (b)  of  the  expectation 
value  of  the  magnetization 
per  particle  (m)  vs  the 
number  of  measurements  M. 
We  used  a  cold  start  ( solid 
line )  and  a  hot  start  (< dashed 
line )  to  achieve  these  results 


MxlO4  MxlO4 


Fig.  15.4  Typical  spin 
configuration  for  a 
temperature  well  above  the 
critical  temperature  7c.  Black 
shaded  areas  correspond  to 
spin  up  sites  while  the  white 
areas  are  spin  down  sites 
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in  a  system  of  the  size  N  —  50  which  corresponds  to  m  ~  8  x  104  measurements. 
Moreover,  we  set  k^T  —  3  which  should  be  well  above  Tc  according  to  Eq.  (15.42). 
Hence,  we  expect  paramagnetic  behavior,  i.e.  ( m )  —  0  in  the  equilibrium  since  the 
acceptance  probability  is  rather  large  because  the  spins  are  randomly  orientated.  In 
addition,  Fig.  15.4  shows  a  typical  spin  configuration  for  a  temperature  well  above 

Tc • 

According  to  Fig.  15.3b  the  expectation  value  of  the  magnetization  per  particle 
( m }  approaches  indeed  zero  after  a  rather  short  thermalization  period  independent 
of  the  starting  procedure.  This  is  certainly  not  the  case  for  the  energy  expectation 
value  per  particle  (s).  Fig.  15.3a,  which  does  not  approach  saturation  even  after 
M  ~  8  x  104  measurements  for  both  starting  procedures.  The  consequence  is  that 
the  thermalization  period  certainly  needs  to  be  longer  than  only  30  sweeps. 


242 


15  The  Ising  Model 


Keeping  this  result  in  mind  we  move  on  to  perform  the  next  check  of  our 
numerics,  namely  to  study  the  influence  of  the  system  size  N  on  the  numerical 
results  we  get  for  the  observables  (s),  ( m }  as  well  as  Ch  and  /  as  functions  of 
temperature  T.  Let  us  outline  the  strategy:  a  thermalization  period  of  500  sweeps 
will  be  used  and  10  sweeps  between  each  measurement  will  be  discarded.  Moreover, 
we  start  with  the  hot  start  configuration  and  at  a  temperature  kB7o  =  3  well  above 
Tc.  After  the  measurements  at  To  have  been  finished,  the  temperature  is  slightly 
decreased,  T\  <  To. 

One  more  point  should  be  addressed:  We  perform  a  simulation  using  the 
strategy  outlined  above  and  obtain  as  a  result  some  observable  O  as  a  function  of 
temperatures  { Tn },  with  To  the  initial  temperature  well  above  Tc  and  Tn+\  <  Tn. 
From  the  physics  point  of  view,  this  temperature  dependence  will,  of  course,  be 
most  interesting  for  temperatures  T  %  Tq.  Thus,  what  we  need  is  an  adaptive 
cooling  strategy  designed  in  such  a  way  that  the  temperature  is  decreased  rapidly 
for  temperatures  T  Tc  or  T  Tc,  but  for  T  &  Tc  the  temperature  is  modified 
only  minimally.  [This  question  will  also  be  a  very  important  point  in  the  discussion 
of  simulated  annealing ,  a  stochastic  optimization  strategy  (see  Sect.  20.3).]  At  the 
moment  we  are  satisfied  with  equally  spaced  temperatures,  i.e.  TVh  =  T^—8,  where 
8  =  const  because  we  are  mainly  interested  to  study  the  influence  of  the  system  size 
N  on  our  calculations. 

The  error  bars  of  the  calculated  expectation  values  have  been  obtained  with  the 
help  of  Eq.  (15.47).  The  error  estimates  for  the  heat  capacity  cj2  as  well  as  for  the 
magnetic  susceptibility  /  are  more  complex  to  evaluate.  The  method  employed  is 
referred  to  as  statistical  bootstrap ,  where  M  —  100  samples  have  been  generated. 
This  method  will  be  explained  in  some  detail  in  Chap.  19. 

In  Fig.  15.5  we  compare  the  expectation  value  of  the  energy  per  particle,  (s),  the 
absolute  value  of  the  magnetization  per  particle  |  (m)  | ,  the  overall  heat  capacity  Ch 
and  the  overall  magnetic  susceptibility  /  for  four  system  sizes  N  —  5,  20,  50, 100. 
Furthermore,  in  Fig.  15.6  we  show  the  curves  for  the  system  size  of  N  =  50  together 
with  corresponding  error  bars. 

We  observe  that  the  phase  transition  becomes  sharper  with  increasing  system 
size.  In  fact  we  know,  that  the  phase  transition  is  infinitely  sharp  as  N  oo  from 
the  analytic  solution  given  by  ONSAGER.  It  is  a  quite  obvious  result  of  this  study 
that  the  system  size  N  should  be  greater  than  20  to  achieve  acceptable  results. 

Furthermore,  we  presented  the  absolute  value  of  the  magnetization  rather  than  the 
magnetization  itself.  The  reason  is  that  for  T  <  Tc  the  ground  state  is  degenerate.  In 
particular,  the  state  with  all  spins  up  or  all  spins  down  is  equally  probable  since  we 
set  the  external  magnetic  field  h  —  0.  This  is  a  manifestation  of  the  Z2  symmetry  of 
the  Hamiltonian  discussed  in  Sect.  15.1. 

Of  particular  interest  is  the  region  around  the  critical  temperature,  referred  to 
as  the  critical  region .  In  this  region,  the  spins  are  not  perfectly  aligned  and  not  ran¬ 
domly  orientated  either.  In  this  region  the  spins  align  in  so  called  magnetic  domains , 
which  are  also  referred  to  as  WEISS  domains  [1].  A  typical  spin  configuration  which 
exhibits  such  domains  is  presented  in  Fig.  15.7. 
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Fig.  15.5  (a)  The  expectation  value  of  the  energy  per  particle  ( s ),  (b)  the  absolute  value  of  the 
expectation  value  of  the  magnetization  per  particle  \{m)\,  (c)  the  heat  capacity  cj7,  and  (d)  the 
magnetic  susceptibility  /  vs  temperature  k^T  for  the  two-dimensional  Ising  model.  The  system 
sizes  are  N  =  5,  20,  50, 100 


We  conclude  this  chapter  with  an  interesting  note:  Fig.  15.6  makes  it  quite  clear 
that  the  error  of  the  expectation  value  of  the  magnetization  and  of  the  energy  is 
biggest  for  values  around  the  transition  temperature.  In  fact,  if  we  increase  the 
system  size  the  error  will  become  even  larger.  The  reason  is  quite  obvious:  The  error 
of  our  Monte-Carlo  integration  is  proportional  to  the  square  root  of  the  variance 
of  the  investigated  observable.  However,  since  we  deal  with  a  second  order  phase 
transition,  this  variance  tends  to  infinity  as  N  ->  oo  [4].  There  is  one  cure  to  the 
problem:  We  are  dealing  here  with  finite-sized  systems,  thus,  the  variance  will  never 
actually  be  infinitely  large.  Furthermore,  according  to  Eq.  (15.47)  we  can  decrease 
the  error  by  increasing  the  number  of  measurements.  Hence,  if  one  is  confronted 
with  large  systems,  one  has  also  to  perform  many  measurements  in  order  to  reduce 
the  error. 14 


14 We  note  from  Eq.  (15.47)  that  we  have  to  perform  four  times  as  many  measurements  in  order  to 
reduce  the  error  by  a  factor  2. 
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Fig.  15.6  (a)  The  expectation  value  of  the  energy  per  particle  ( s ),  (b)  the  expectation  value  of 
the  magnetization  per  particle  |  (m)  |,  (c)  the  heat  capacity  Ch,  and  (d)  the  magnetic  susceptibility  / 
with  error  bars  vs  temperature  k^T  obtained  for  the  two-dimensional  ISING  model  of  size  N  =  50 
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Fig.  15.7  For  T  ~  the  spins  organize  in  WEISS  domains.  Here  we  show  a  typical  spin 
configuration  for  N  =  100  and  k^T  =  1.15.  The  black  shaded  areas  correspond  to  spin  up  sites 
while  the  white  areas  indicate  spin  down  sites 
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Summary 

The  Ising  model  is  a  rather  simple  model  which  describes  effectively  a  second  order 
phase  transition.  Such  phase  transitions  are  the  topic  of  extensive  numerical  studies 
and,  therefore,  this  model  served  here  as  a  tool  to  demonstrate  how  to  proceed  from 
the  problem  analysis  to  a  numerical  algorithm  which  will  allow  to  simulate  the 
physics.  The  advantage  of  the  ISING  model  was  that  under  certain  simplifications 
solutions  could  be  derived  analytically.  In  the  course  of  this  analysis  the  important 
concept  of  observables  was  introduced.  Observables  are  certain  physical  properties 
of  a  system  which  characterize  the  specific  phenomenon  of  interest.  Numerically, 
observables  are  certain  variables  which  are  to  be  ‘measured’  within  the  course  of 
a  simulation.  After  the  extensive  analysis  of  the  ISING  model  the  transition  to  the 
numerical  analysis  of  the  two-dimensional  ISING  model  was  a  rather  easy  part.  The 
required  modification  of  spin  configurations  turned  out  to  be  the  key  element  of 
the  simulation  and  this  suggested  the  application  of  the  METROPOLIS  algorithm 
for  sampling.  Finally,  important  problems  like  initialization  of  the  simulation, 
thermalization,  finite  size  effects,  measurement  of  observables,  and  the  prevention 
of  correlations  between  subsequent  spin  configurations  caused  by  the  METROPOLIS 
algorithm  have  been  discussed  on  the  basis  of  concrete  calculations.  The  first  part 
of  this  chapter  was  motivated  by  W.  S.  Dorn  and  D.  D.  McCracken  [20]: 

Numerical  methods  are  no  excuse  for  poor  analysis. 


Problems 

1.  Write  a  program  to  simulate  the  two-dimensional  ISING  model  with  periodic 
boundary  conditions  with  the  help  of  the  METROPOLIS  algorithm.  Follow  the 
strategy  outlined  in  Sect.  15.2  and  try  to  reproduce  the  results  illustrated  in 
Sect.  15.3  for  N  —  5,  20,  50. 

In  particular,  as  a  first  step  write  a  routine  which  stores  the  nearest  neighbors 
of  the  square  lattice  in  an  array.  As  a  second  step,  write  a  program  which  performs 
a  sweep  through  the  lattice  geometry.  You  can  either  choose  the  lattice  sites 
systematically  or  at  random.  As  a  third  step,  set  up  the  main  program  which  calls 
the  sweep  routine.  Choose  some  initial  configuration  and  thermalize  the  system. 
Measure  the  expectation  value  of  the  energy  per  particle  as  well  as  the  absolute 
value  of  the  expectation  value  of  the  magnetization  for  different  temperatures  k b  T 
and  determine  the  respective  errors,  see  Eq.  (15.47).  Calculate  also  the  overall 
magnetic  susceptibility  and  the  overall  heat  capacity.  The  determination  of  the 
error  is  more  complicated  in  this  case  and  can  therefore  be  neglected  for  the 
moment. 

Good  parameters  to  start  with  are  /  =  0.5,  N therm  —  500,  NskiP  —  10  and 
h  —  0.0. 

2.  Try  also  different  values  of  /  and  h  ^  0. 
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Chapter  16 

Some  Basics  of  Stochastic  Processes 


16.1  Introduction 

This  chapter  is  devoted  to  an  introduction  to  some  basic  concepts  of  stochastic 
processes.  This  introduction  serves  two  purposes:  First  of  all,  it  allows  for  a  more 
systematic  treatment  of  non-deterministic  methods  in  Computational  Physics  which 
is  certainly  necessary  if  we  really  aim  at  an  understanding  of  these  methods. 
The  second  reason  can  be  found  in  the  elementary  importance  of  stochastics  in 
modern  theoretical  physics  and  chemistry  in  general.  Hence,  many  of  the  concepts 
elaborated  within  this  chapter  will  be  of  profound  importance  in  subsequent 
chapters.  For  instance,  we  present  a  discussion  of  diffusion  theory  in  Chap.  17  as 
a  motivating  example. 

The  reader  not  familiar  with  the  basics  of  probability  theory  [1-4]  is  highly 
encouraged  to  at  least  consult  Appendix  E  before  proceeding.  In  particular,  we  are 
going  to  apply  the  notation  introduced  in  Appendix  E  throughout  this  chapter. 

The  basics  of  stochastic  processes  will  be  discussed  within  five  sections  including 
this  introduction.  In  Sect.  16.2  we  introduce  basic  definitions  associated  with 
stochastic  processes  in  general.  Here  we  discuss  concepts  which  will  serve  as  a 
basis  for  an  understanding  of  the  methods  presented  within  the  subsequent  sections. 
Section  16.3  deals  with  a  special  class  of  stochastic  processes,  the  so  called 
Markov  processes.  As  we  shall  see,  these  processes  are  of  fundamental  importance 
for  statistical  physics  and  for  computational  methods.  Moreover,  in  Sect.  16.4 
we  consider  so  called  MARKOV-chains  which  are  discrete  Markov  processes 
defined  on  a  discrete  time  span.  This  will  serve  as  the  basis  of  a  very  important 
method  in  computational  physics,  the  so  called  MARKOV-Chain  Monte  Carlo 
technique.  We  already  encountered  a  simple  example  of  this  method  in  Sect.  14.3 
and  in  Chap.  15,  the  Metropolis  algorithm.  Finally,  in  Sect.  16.5  continuous-time 
MARKOV-chains  will  be  discussed,  in  particular,  discrete  Markov  processes  on  a 
continuous  time  span.  These  processes  are  very  important,  for  instance,  in  diffusion 
theory  as  will  be  demonstrated  in  Chap.  17. 
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A  discussion  of  detailed  balance  will  also  be  included  in  the  section  on  Markov 
processes,  Sect.  16.3,  although  detailed  balance  follows  from  physical  arguments. 
Detailed  balance  has  already  been  introduced  in  our  discussion  of  the  Metropolis 
algorithm,  Sect.  14.3. 


16.2  Stochastic  Processes 


The  following  discussion  is  primarily  restricted  to  one-dimensional  processes. 

A  stochastic  process  is  a  time  dependent  process  depending  on  randomness  [5- 
7].  From  a  mathematical  point  of  view,  a  stochastic  process  Yx(t)  is  a  random 
variable  Y  which  is  a  function  of  another  random  variable  X  and  time  t  >  0: 


Yx(t)  =  f(X,  t)  . 


(16.1) 


Here  we  apply  the  notation  of  Appendix  E  and  denote  random  variables  by 
capital  letters,  such  as  X,  and  their  realization  by  lower  case  characters,  such  as 
v.  Consequently,  the  realization  of  a  stochastic  process  is  described  by 

Yx(t)  =f(x,t)  .  (16.2) 


The  set  of  all  possible  realizations  of  Yx(t)  spans  the  state  space  of  the  stochastic 
process.  We  note  that  it  is  in  principle  not  necessary  to  define  t  as  the  time  in  a 
classical  sense.  It  suffices  to  denote  t  e  T,  where  T  is  a  totally  ordered  set  such  as, 
for  instance,  T  —  N  the  natural  numbers.  The  set  T  is  referred  to  as  the  time  span. 
We  distinguish  four  different  scenarios: 

•  discrete  state  space,  discrete  time  span, 

•  continuous  state  space,  discrete  time  span, 

•  discrete  state  space,  continuous  time  span, 

•  continuous  state  space,  continuous  time  span. 

Stochastic  processes  on  a  continuous  time  span  are  referred  to  as  continuous-time 
stochastic  processes. 

Suppose  the  random  variable  X  follows  the  pdf  px(x)-  It  is  then  an  easy  task  to 
calculate  averages  (7(f))  of  the  stochastic  process  Yx(t)  via 


(Y(t))  =  J  d xYx(t)px(x)  . 

This  concept  is  easily  extended  to  multiple  times  t\,  t2, . . . ,  tn  by 


(16.3) 


dx  Yx(t\  )Yx(t2)  ■  ■  ■  Yx(tn)px(x )  , 


(16.4) 
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which  defines  the  moments  of  the  stochastic  process  [1,  5,  8].  Similar  to  the  concept 
of  the  correlation  coefficient  (see  Appendix,  Sect.  E.  10)  we  define  the  so  called  auto¬ 
correlation  function  K  (t\ ,  tf)  ’• 


,  (lY(h)  -  {Y(ti))]  [ Y(t2 )  -  (Y(t2))]} 

K(h,t2)  =  . 

-  <roi)>]2)([n?2)  -  (n?2)>]2) 

=  {Y(tl)Y(t2))  -  (y(fQ)  (Y(t2)> 

-N/var[y(^i)]var[y(r2)] 

y[Y(tO,Y(t2)l 

=  —  —  .  (16.5) 

yvar[F(ri)]var[y(r2)] 

The  function  y[Y(t\),  Y(t2)\  is  referred  to  as  the  auto -covariance  function  and  is 
defined  as 


y[Y(h),  Y(t2)]  =  CO v[Y(ti),  Y(t2)]  .  (16.6) 


We  proceed  by  defining  the  pdf  of  a  stochastic  process  Yx(t).  The  pdf  p\(y,  t ), 
which  describes  the  probability  that  the  stochastic  process  Yx(t)  takes  on  its 
representation  y  at  time  t,  is  given  by  (see  Sect.  14.2) 


pi  (y,  t ) 


j  dxc5 [y  -  Yx(t)]px(x)  . 


(16.7) 


We  define,  in  analogy,  the  pdf  pn(y\,  h,y2,t2, . . .  ,yn,  tn)  which  describes  the 
probability  that  the  stochastic  process  takes  on  the  realization  y\  at  time  t\ ,  y 2  at 
time  t2,  . . . ,  and  yn  at  time  tn  for  arbitrary  n: 


Pn  iy  1 5  t\  1  yi  >  i'l  1  •  •  •  >  yn  ?  tn) 


j  ck5[yi  -  Yx(ti)]8[y2  -  Yx(t2)]--- 
x8[yn  -  Yx(tn)]px(x )  .  (16.8) 


This  is  referred  to  as  the  hierarchy  of  pdfs.  We  note  the  following  important 
properties  of  the  pdf  p„(yi,ti,y2,t2, . . .  ,y„,  tn)  [8]: 


pn(y\,t\,y2,t2, ... ,yn,tn )  >  0  , 

Pn ( ■  •  • > yk>  tk . . •  >  yi > ti >  •  •  •)  =  (•  •  •  >  yt  >  ft  •  •  •  >  yt .  f t . . . . ) , 


/ 

/ 


Pn  Cy  1  ?  6  ?  •  •  •  j  tf)  —  Pn—l  Ofi  ?  6  ?  •  •  •  ?  1  ?  l)  ? 

dypi(y,0  =  l  . 


(16.9) 

(16.10) 

(16.11) 

(16.12) 
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The  moments  defined  in  Eq.  (16.4)  can  also  be  expressed  with  the  help  of  the  pdfs 
Pn  by 


-/ 


(Y(ti)Y(t2)---Y(tn))  =  /  dyi . .  .dynyi---y„pn(yi,h,  ■  ■  ■ 


(16.13) 


Conditional  pdfs  can  also  be  introduced.  They  describe  the  probability  that 
we  have  34+1  at  4+1 , . . . ,  yk+i  at  4-N  if  there  existed  y  1  at  4,  . . . ,  34  at  4  via 


Pi\k(yk+\,tk+u  •  •  •  ,yk+i,  tk+i\yuh, . . .  ,34,  4) 


/4+f  (y  1 , 4 ,  .  .  .  ,  ,  4+f ) 

Pkiydu- . .  ,34,4) 

(16.14) 


It  follows  that 


J  dy2pi\i(y2,t2\yi,ti)  =  1  •  (16.15) 

Let  us  give  some  further  definitions  [8]: 

•  A  stochastic  process  is  referred  to  as  a  stationary  process  if  the  moments  defined 
in  Eq.  (16.4)  are  invariant  under  a  time-shift  At: 

{Y(ti)Y(t2)  •  •  •  Y(tn))  =  (Y(t t  +  Af)Ffe  +  Af)  ■  ■  •  7(f„  +  At))  .  (16.16) 

In  particular,  one  has  (Y(t))  =  const  and  the  auto-covariance  depends  only  on 
the  time  difference  \  t\  —4 1: 

y(4,4)  =  cov[T(4),T(4)]  =  cov[T(0),  Y(\t\  -  t2\)]  =  y(4  -4)  •  (16.17) 

It  is  understood  that  y(t)  =  y(—t).  Moreover,  we  have 

Pn(y  uh  +  At, . . .  ,y„,4  +  At)  =  pw(yi,4, . .  .,yn,tn)  ,  (16.18) 

and  in  particular,  pi  (y,  0  =  Pi  (y). 

•  A  time-homogeneous  process  is  a  stochastic  process  whose  conditional  pdfs  are 
stationary 


Pi|i(y2,?2|yi,f2  -  r)  =  Pi|iCy2,S2|yi,'S2  -  r)  ,  (16.19) 

for  all  4,  r,  54 .  The  pdf  74  p  is  referred  to  as  transition  probability. 

•  A  process  of  stationary  increments  is  a  stochastic  process  Yx(t)  for  which  the 
difference  Yx(t2)  —  Yx(t\)  is  stationary  for  all  4 — 4 ,  with  4  >  4  >  0.  This  means, 
in  particular,  that  the  pdf  of  this  process  depends  only  on  the  time  difference 
4  —  4.  The  quantities  Yx(t2)  —  Yx(t\)  are  referred  to  as  increments. 
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•  A  process  of  independent  increments  is  a  stochastic  process  Yxif)  for  which  the 
differences 


Yx(h)  ~  Yx(t\),  Yx{tf)  —  Yxitf), . . . ,  Yx(tn)  —  Yx(tn- 1)  , 


are  independent  for  all  tn  >  tn-\  >  . . .  >  ^  >  h . 

•  A  Levy  process  is  a  continuous-time  stochastic  process  with  stationary 
independent  increments  which  starts  with  1A(0)  =  0. 

•  A  Gaussian  process  is  a  stochastic  process  Yx(t)  for  which  all  finite  linear 
combinations  of  Yx(t),  t  e  T  follow  a  normal  distribution  (see  Appendix, 
Sect.  E.7).  We  shall  come  back  to  this  kind  of  process  in  Chap.  17. 

•  A  Wiener  process  is  a  continuous-time  stochastic  process  with  independent 
increments  which  starts  with  1A(0)  =  0  and  for  which  the  increments  Yxiti)  — 
Yx(t\)  follow  a  normal  distribution  with  mean  0  and  variance  t2 — 1\ .  The  Wiener 
process  is  a  special  case  of  a  Levy  process.  One  of  the  main  applications  of  the 
Wiener  process  is  to  study  Brownian  motion  or  diffusion.  This  process  will  be 
discussed  in  more  detail  in  Sect.  16.3  and  in  Chap.  17. 

•  The  random  walk  is  the  discrete  analogy  to  the  Wiener  process  [9-11].  This 
means  in  particular  that  if  the  step  size  of  the  random  walk  goes  to  zero,  the 
Wiener  process  is  reestablished.  This  point  will  be  elucidated  in  Chap.  17. 

After  stating  the  most  important  definitions,  we  proceed  to  the  next  section 

in  which  the  attention  is  on  a  special  class  of  stochastic  processes,  the  so  called 

Markov  processes. 


16.3  Markov  Processes 

A  Markov  process  is  a  stochastic  process  Yx(t)  for  which  the  conditional  pdf 
Pi\n-\  satisfies  for  arbitrary  n  and  t\  <  ^  •  *  <  tn  the  relation 

Pl\n—  1  (yni  tn  >  t\  >  •  •  •  >  Yn—  1  >  tn—  l)  — P\\\ G  \yn—  1  >  G—  l)  •  (16.20) 

Hence,  a  Markov  process  is  a  process  in  which  any  state  yn ,  tn  is  uniquely  defined 
by  its  precursor  state  yn- \ ,  tn~ \  and  is  independent  of  the  entire  rest  of  the  past  [12]. 
Markov  processes  are  of  particular  importance  in  natural  sciences  because  of  their 
rather  simple  structure.  This  will  become  clear  throughout  the  rest  of  this  book. 

We  note  in  passing  that  a  process  with  independent  increments  is  always  a 
Markov  process  because 


Yx(tn+\)  —  Yx(tn)  +  \Yx(tn+ 1)  —  Yx(tn)\  , 


(16.21) 
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is  satisfied.  Since  the  increment  Yx(tn+ 1)  —  Yx{tn )  is  independent  of  all  previous 
increments  which  gave  rise  to  Yx(tn)  by  definition ,  Yx(tn+ 1)  depends  only  on  Yx(tn ), 
which  is  exactly  the  Markov  property  (16.20). 

The  quantity  Pi\\(yn,  tn\yn- 1,  tn~ i)  which  appears  in  Eq.  (16.20)  is  referred  to  as 
transition  probability .  Given  the  transition  probability  ppi  together  with  the  pdf  p\, 
one  can  construct  the  whole  hierarchy  of  pdfs  (16.8)  of  the  Markov  process  by 
calculating  successively  [8]: 

Pi(y\,t\,y2,t2,y-i,h)  =  Pi\2(y3,t3\y\,ti,y2,t2)p2(yi,t\,y2,t2) , 

=  Pi\i(y3,t3\y2,t2)pi\i(y2,t2\yi,h)pi(yi,ti) , 


:  :  (16.22) 

Here  we  employed  definition  (16.14)  and  in  the  second  step  of  the  second  equation 
we  employed  for p\\2  the  Markov  property  (16.20). 

The  fact  that  the  whole  hierarchy  of  pdfs  can  be  constructed  by  repeating  the  steps 
illustrated  in  Eq.  (16.22)  reveals  the  rather  simple  structure  of  Markov  processes. 
However,  Eq.  (16.22)  contains  another  useful  information.  We  regard  the  pdf  p 3 
of  (16.22)  for  three  successive  times  t\  <  t2  <  h.  First  we  integrate  the  left-hand 
side  with  respect  to  y 2  which  yields  with  the  help  of  property  (16.10): 

J dy2p3(yi,ti,y2,t2,y3,h)  =  P2(yi,h,y3,h)  ■  (16.23) 

Hence,  we  have 

P2(yi,h,y3,h)  =pi(y\,h)  J dy2pi\i(y3,t3\y2,t2)pi\i(y2,t2\yi,h) ,  (16.24) 

or  after  dividing  both  sides  by  p\(y\,t\)  and  by  keeping  in  mind  Eq.  (16.14)  we 
arrive  at: 

Pi\i(y3,t3\yuti)  -  J  dy2pi\i(y3,t3\y2,t2)pi\i(y2,t2\y\,h)  ■  (16.25) 

This  equation  is  known  as  the  CHAPMAN-KOLMOGOROV  equation  [8,  13].  The 
interpretation  of  this  equation  is  straight-forward:  the  transition  probability  from 
(yi,t\)  to  (J3,  tfi)  is  equivalent  to  the  transition  probability  from  to  (y2,  t^) 

multiplied  by  the  transition  probability  from  (y2,ti)  to  (y?>,h)  when  summed  over 
all  intermediate  positions  y^.  This  is  illustrated  in  Fig.  16.1. 
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Fig.  16.1  Illustration  of  the 

Chapmann-Kolmogorov 

equation 


We  state  a  very  important  theorem:  Any  two  non-negative  functions  p\\\  and 
Pi  uniquely  define  a  MARKOV  process  if  the  CHAPMAN-KOLMOGOROV  equa¬ 
tion  (16.25)  is  obeyed  and  if 

p\(yi,h)  =  J &y\Pi\\(yi,t2\y\J\)p\(yut\) ,  (16.26) 

which  follows  immediately  from  the  first  equation  in  Eqs.  (16.22)  in  combination 
with  property  (16.10). 

As  a  first  example  we  consider  one  of  the  most  important  Markov  processes 
in  physics,  the  Wiener  process  [10].  Its  importance  stems  from  its  application  to 
the  description  of  Brownian  motion,  the  random  motion  of  dust  particles  on  a  fluid 
surface.  (In  Chap.  17  we  take  a  closer  look  at  diffusion  phenomena.)  The  transition 
probability  of  the  Wiener  process  is  of  the  form 

(16.27) 

The  initial  condition  is  given  by 


Pi\i(y2,ti\yuh)  = 


(yi  -y\) 

Jlnih  —  ti )  r  I  2(t2  —  h)  _ 


exp 


p\(y\,t\  =  o)  =  S(yi) .  (16.28) 

A  straight-forward  calculation  proves  that  (16.27)  indeed  obeys  the  Chapman- 
Kolmogorov  equation  (16.25).  Moreover,  we  deduce  from  Eqs.  (16.28)  together 
with  (16.26)  that 


PiCy.O 


(16.29) 


^his  form  is  equivalent  to  the  above  definition  of  the  Wiener  process,  in  particular  to  the 
requirement  of  normally  distributed  increments  with  variance  t2~  t\. 
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Fig.  16.2  Three  possible 
realizations  of  the  Wiener 
process 


The  Wiener  process  is  easily  realized  on  the  computer.  We  regard  the  one¬ 
dimensional  case  and  start  at  the  origin  Tx(0)  =  0.  As  per  definition  the  increments 
Yx(t  +  d t)  —  Yx(t)  follow  a  normal  distribution  jY(dy\0,dt)  of  mean  zero  and 
variance  d t.  Hence,  we  start  with  y0  =  0  at  time  to  —  0,  sample  the  displacement 
dy  within  a  time-step  dt  from  yK(dy\0,  d t)  and  calculate  the  next  position  at  time 
t\  =  to  +  dt  which  is  given  by:  y\  —  yo  +  dy.  This  process  is  repeated  until  a 
certain  time  limit  has  been  reached.  Figure  16.2  presents  the  result  of  three  such 
calculations  which  have  been  started  using  different  seeds. 

Let  us  mention  a  second  very  important  MARKOV  process,  the  POISSON  process. 
The  POISSON  process  is  particularly  interesting  for  problems  involving  waiting 
times ,  such  as  the  decay  of  some  radioactive  nucleus.  However,  we  shall  also 
come  across  the  POISSON  process  within  the  context  of  diffusion  in  Chap.  17.  The 
transition  probability  of  the  POISSON  process  is  defined  as 


Pi\i(n2,t2\ni,ti) 


(h  -  tyryy 


exp[-(r2  -?i)]  . 


(16.30) 


Here  it  is  understood  that  n\,U2  £  N  and  >  n\.  Hence,  the  POISSON  process 
counts  the  number  of  occurrences  of  a  certain  event  until  the  time  t2  is  reached 
under  the  premise  that  n\  events  have  already  occurred  at  time  t\.  The  POISSON 
process  is  initialized  by  the  pdf 


pi(nuti  =  0)  =  8ni o  , 

here  5#  is  the  Kronecker-5.  Hence  we  have  according  to  Eq.  (16.26) 


(16.31) 


tn 

p\(n,t)  =  }  ' pui(n,t\nuh  =  0)pi(nuh  =  0)  =  —  exp(— /)  ,  (16.32) 

L — '  n\ 

n\ 


2  Alternatively,  we  may  sample  dy  from  a  normal  distribution  with  variance  1  and  multiply  it  by 
\fdt.  This  follows  from  a  simple  transformation  of  variables. 
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Fig.  16.3  Three  possible 
realizations  of  a  POISSON 
process 


which  is  a  POISSON  distribution  (see  Appendix,  Sect.  E. 4  and,  for  instance, 
Ref.  [14]).  Let  us  briefly  consider  the  time  between  two  events.  Suppose  we  had 
n\  events  at  time  t\ .  Then,  we  calculate  the  probability  that  at  time  f2  —  t\  +  x  we 
still  counted  rz2  =  n\  events,  thus,  nothing  happened.  We  have 


P\\\(ni,t\  +  x\nx,tx)  =  exp(-r)  (16.33) 

and  the  waiting  times  are  independent  and  follow  an  exponential  distribution.  We 
may  simulate  the  POISSON  process  by  starting  at  t\  =0  with  n\  =0  and  by 
increasing  ft2,n3,...  by  one,  i.e.  rii+  \  —  nt  +  1  after  successive  waiting  times 
ti,  r2, . . .  which  we  sample  from  the  exponential  distribution  (see  Sect.  13.2)  until 
a  final  count  N  has  been  reached.  The  result  of  such  a  procedure  is  illustrated  in 
Fig.  16.3  where  n(t )  vs  t  has  been  plotted  for  three  runs  started  by  different  seeds. 

Finally,  we  remark  that  for  a  time-homogeneous  Markov  process  the  transition 
probability  p\\\  (y2,  h  \yi ,  h)  depends  by  definition  on  the  time  difference  f2  —  t\  =  t 
rather  than  explicitly  on  the  two  times  t\  and  f2  and  is  usually  denoted  by  Tx  (y2,  yi). 

We  turn  now  our  attention  to  another  very  important  general  concept  of  Markov 
processes,  the  master  equation  [8].  This  equation  is  in  fact  the  differential  form  of 
the  Chapman-Kolmogorov  equation.  We  regard  the  Chapman-Kolmogorov 
equation  (16.25)  for  three  successive  times  t\  <  f2  <  f3  =  f2  +  r  where  r  is  assumed 
to  be  small,  i.e.  r  «  1.  We  expand  the  conditional  pdf  pi\i  in  a  Taylor  series  with 
respect  to  r: 


d 

p  info.  £  +  A  y2,ti)  =  Pi\\(yi,t2\y2,t2)  +  x—px\x(y2„t2  +  Ay2,t2) 


r=0 


+  e(t2) 


(16.34) 

In  order  to  transform  this  equation  into  a  more  transparent  form,  we  introduce  the 
transition  rate  wfo  |y2,  f2)  from  y2  to  y$,  with  y2  ^  yp. 

d 

w(y2\y2,t2)  =  -r~P\\\(y2,t2  +  Ayi,h) 


r= 0 


(16.35) 
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In  addition,  we  note  that  the  first  term  on  the  right-hand  side  of  Eq.  (16.34)  has  to 
be  of  the  form: 


Pi\i(y3, h\yi, h)  =  <503  -yi)  •  (16.36) 

On  the  other  hand,  we  defined  the  transition  rate  (16.35)  only  for  elements  y2  ^  J3 
and,  thus,  we  denote  the  remaining  part  (i.e.  y2  —  y^)  by  a(y2,  f2).  All  this  allows  us 
to  rewrite  Eq.  (16.34)  as 

P\\\(y3,h  +  r\y2,t2)  =  [1  +  a(y2,  fc)]%3  -  yi)  +  tw(y3|;y2,f2)  ,  (16.37) 

where  we  neglected  terms  of  order  ^(r2).  The  pdf  p\\\(y3,  h  +  r| y2,  t2)  is  subject 
to  the  normalization  (16.15)  and  this  provides  us  with  the  required  condition  to 
determine  a(y2,  h)'- 


a(y2,t2 )  =  -r  J  dy3w(y3\y2,  t2)  ■  (16.38) 

Hence,  the  term  1  +  a(y2 ,  t2)  describes  the  probability  that  no  event  occurs  within 
the  time  interval  [f2,  t2  +  r].  The  expansion  (16.37)  is  inserted  into  the  Chapman- 
KOLMOGOROV  equation  (16.25)  with  the  result: 

Pi\i(y3,t2  +  r\yuh)-pi\i(y3,t2\yuh)  f,  r  ,  .  ,  .  . 

- - - =  /  d^2  [w(y3|y2»  t2)Pi\i(y2,  h\yuh) 

-w{y2\y2,t2)p\\\(y2,t2\y\,  6)]  • 

(16.39) 

Finally,  we  arrive,  in  the  limit  r  ->  0,  at  the  master  equation: 

jtP\\\(y,Ay' >t')  =  j  dy”  [w(y\y" ,t)px\l(y" ,t\y  ,t') 

-w(y"\y,t)pi\i(y,t\y',t')]  ■  (16.40) 

We  multiply  this  equation  by  p\(y',  t')  and  integrate  over  /.  This  results  in  the 
master  equation  for  the  pdf  p\(y,t) 

d  C 

jtp\(y,t)  =  J  dy'[w(y\y',t)px{y',t)-w(y'\y,t)px(y,t)]  ,  (16.41) 

where  we  made  use  of  the  property  (16.26). 

Let  us  briefly  discuss  this  result:  In  its  derivation  we  assumed  the  state  space  to 
be  continuous.  However,  if  a  master  equation  for  a  discrete  state  space  is  required 
the  integral  is  to  be  replaced  by  a  sum  over  the  discrete  states.  On  the  other  hand,  the 
physical  interpretation  of  such  an  equation  is  straight-forward:  The  time  evolution 
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of  the  quantity  p\(y,  t)  is  governed  by  the  sum  over  all  transitions  into  state  y 
(first  term)  minus  all  transitions  out  of  state  y.  We  remark  that  master  equations 
occur  commonly  in  physical  applications;  for  instance,  the  collision  integral  of  the 
Boltzmann  equation  is  of  a  similar  form.  The  transitions  rates  w(y\yr ,  t)  can  be 
determined  explicitly  in  many  applications  in  physics. 

Furthermore,  if  the  system  is  in  a  stationary  state  then  it  is  described  by  a 
stationary  distribution p\ (y,  t)  —  p\{y)  and  we  obtain  from  (16.41)  the  relation 

J  dy'w(y\y',t)pl(y,)  =  J  dy'w(y/\y,t)pi(y)  ,  (16.42) 

which  is  referred  to  as  global  balance.  The  much  stronger  condition 

w(y\y',  t)pi (/)  =  w(y'\y,t)pi(y)  ,  (16.43) 

is  referred  to  as  detailed  balance  and  will  be  discussed  next. 

The  task  now  is  to  prove  that  the  equilibrium  distribution  function  pe(X)  of 
a  classical  physical  system  will,  under  certain  restrictions,  indeed  fulfill  detailed 
balance.  (This  proof  was  given  by  N.G.  VAN  Kampen  [5].)  The  next  steps  of  the 
proof  become  more  transparent  if  a  vectors  =  (< qk,Pk)T  €  M 6N  is  introduced  which 
represents  the  phase  space  trajectory  of  the  N  particles  constituting  the  system  under 
investigation.  This  trajectory  is  determined  by  Hamilton’s  equations  of  motion 
[16-18]: 


(16.44) 


Furthermore,  Yx(t)  denotes  a  stochastic  process  which  describes  some  observable 
of  the  physical  system.  We  require  that  Yx(t)  is  invariant  under  time  reversal.  We 
also  assume  the  equilibrium  distribution  function  pe(x)  to  be  invariant  under  time 
reversal,  which  in  most  cases  is  equivalent  to  the  requirement  that  the  HAMILTON 
function  H(x)  is  invariant  under  time  reversal.  The  operation  of  time  reversal  will 
be  indicated  by  bared  variables: 


t  —  -t ,  x=  (qk,  -pk)T  . 


(16.45) 


3  As  an  example  we  quote  Fermi’s  golden  rule  [15],  where  the  transition  rate  wnnr  from  unperturbed 
states  n  to  n r  is  of  the  form 


Wnn'  pi  \Hnn'  \p(En)  , 

where  H'nn,  are  the  matrix  elements  of  the  perturbation  Hamiltonian  H'  and  p(En)  denotes  the 
density  of  states  of  the  unperturbed  system. 
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Hence,  the  above  assumptions  result  in 


Yx(t)  =  Yx(t)  =  Yj(-t)  =  Yx(t )  , 


(16.46) 


and 


Pe(x)  =  pe(x)  =  pe(x)  .  (16.47) 

In  particular,  we  deduce  from  Eq.  (16.46)  that 

Yx( 0)  =  Yx( 0)  ,  (16.48) 

and 

Yx(t)  =  Yx(-t)  .  (16.49) 

We  calculate  the  pdf  P2  from 

p2(vi.0,_y2,0  =  J  dx8[yi  -  Yx(0)]8\y2  -  Yx(t)]pe(x)  .  (16.50) 

However,  since  we  integrate  over  the  whole  phase  space  we  recognize  that  the 
volume  is  invariant  under  a  change  dx  — >  dx.  Thus,  we  can  change  the  variable 
of  integration  from  v  to  x  and  this  results  in: 


P2(yi,  o, y2,  t)  =  j  -  Yx{0)\&\y2  -  Yx(t)]pe(x) 

=  J  dx<5[yi  -  Yx(())]8\y’2  -  Yx(~t)]pe(x) 

=  Pi(y2,-t,y\X) 

—  P2(y2,0,yi,t)  ■  (16.51) 


We  obtain  immediately: 

Pw\(yi,t\yi^)Peiy\)  =  Pi\\(y\,t\y2X)pe(y2)  ■  (16.52) 

Differentiation  of  this  equation  with  respect  to  t  together  with  definition  (16.35) 
yields  for  small  values  of  t 

w{y2\y\)Pe(y\)  =  w(y\\y2)pe(y2)  ,  (16.53) 

which  is  the  condition  of  detailed  balance,  Eq.  (16.43),  for  stationary  distributions. 

It  should  be  noted  at  this  point  that  detailed  balance  in  physical  systems  is 
strongly  connected  to  the  entropy  growth  (the  //-theorem  by  BOLTZMANN  [19]). 
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Here,  detailed  balance  was  based  on  the  condition  that  the  stochastic  process  Yx(t) 
was  invariant  under  time  reversal  and  that  the  equilibrium  distribution pe(x )  had  the 
same  property.  This  has  in  most  cases  the  consequence  that  the  Hamilton  function 
is  also  invariant  under  time  reversal  transformations.  However,  a  detailed  discussion 
of  these  properties  is  far  beyond  the  scope  of  this  book. 

We  continue  our  presentation  with  so  called  MARKOV-chains  which  are  a  special 
class  of  Markov  processes.  MARKOV-chains  will  prove  to  be  very  important 
for  the  understanding  of  MARKOV-chain  Monte  Carlo  techniques,  such  as  the 
Metropolis  algorithm. 


16.4  MARKOV-Chains 

A  MARKOV-chain  is  a  time-homogeneous  Markov  process  defined  on  a  discrete 
time  span  and  in  a  discrete  state  space  [20-22].  Hence,  we  express  the  time  instances 
by  integers  T  =  N,  tn  =  n  where  n  e  N  and  possible  outcomes  are  indexed  by 
integers  Yx(tn)  £  {m}  where  me  N.  As  a  first  consequence  of  the  discreteness  of 
the  state  space  we  replace  all  pdfs  p(-)  by  probabilities  Pr(-).  Hence  the  Markov 
property  (16.20)  reads 

Pr(F„+1  =  y\Yn  =  yn, . . . ,  Yx  =  yx)  =  Pr(F„+1  =  y\Yn  =  yn)  ,  (16.54) 

where  we  applied  the  notation  Yn  =  Yx(tn)  andy„  £  {m}  is  one  particular  realization 
out  of  the  discrete  state  space.  Since  we  assume  the  transition  probabilities  to  be 
independent  of  the  actual  time,  we  can  define  a  transition  matrix  P  —  {pij}  via 

Pij  —  Pr(Tn+!  =  j\Yn  =  i)  .  (16.55) 


Consequently,  we  write 

P r(T«  —  in?  Yn~ i  —  in— i,  •  •  • ,  To  ■—  io )  —  Pr(To  ■—  io)Pioi\Pi] h  *  *  * Pin—\ in  • 

(16.56) 

We  note  that  the  transition  matrix  is  a  stochastic  matrix ,  a  matrix  with  only  non¬ 
negative  elements  such  that  the  sum  of  each  row  is  equal  to  one.  Furthermore,  one 
can  prove  that  the  product  of  two  stochastic  matrices  results,  again,  in  a  stochastic 
matrix. 

We  define  the  state  vector  at  time  n ,  Tt^n)  —  {ji :\n^}  as 

n\n)  =  Pr(  Y„  =  0  . 


(16.57) 
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From  the  marginalization  rule  (Appendix,  Sect.  E.6)  follows  for  the  particular  case 
n  —  1 


Pi'( y i  =0  =  1]  POP,  =  A-)  Pr(y,  -  i\Y0  =  A)  , 


(16.58) 


or  with  the  help  of  the  definitions  (16.55)  and  (16.57): 


Hence,  we  get  for  ^  =  1 


71 


and  for  n  —  2 


7T  (2>  =  = 


Obviously, 


7T 


(0)  pn 


9 


(16.59) 


(16.60) 


(16.61) 


(16.62) 


follows  for  arbitrary  n.  Hence  the  probability  matrix  for  an  n  step  transition  P^n)  is 
given  by  P^n)  —  Pn.  We  immediately  deduce  that  the  Chapman-Kolmogorov 
equation  for  MARKOV-chains  is  fulfilled  since 


p(n)  p(m)  __  pn  pm  __  pn+m  __  p(n+m  ) 


(16.63) 


for  two  integers  n  and  m. 

Let  us  cite  some  further  definitions  in  order  to  classify  MARKOV-chains  [5,  8, 20- 

22]: 

•  The  notation  i  ->  j  means  state  i  leads  to  state  j  and  is  true  whenever  there 
is  a  path  of  length  n,  /0  =  6  i\, ,  in  —  j  such  that  all  Pikik+l  >  0  for  k  = 
0, 1, . . . ,  n  —  1.  This  is  equivalent  to  (Pn)ij  >  0. 

•  The  notation  i  j  means  state  i  communicates  with  state  j.  This  relation  is  true 
whenever  i  ->  j  and  j  — >  i. 

•  A  class  of  states  is  given  if  (i)  all  states  within  one  class  communicate  with  each 
other  and  (ii)  two  states  of  different  classes  never  communicate  with  each  other. 
These  classes  are  referred  to  as  the  irreducible  classes  of  the  MARKOV-chain. 

•  An  irreducible  Marko \ -chain  is  a  MARKOV-chain  in  which  the  whole  state 
space  forms  an  irreducible  class,  i.e.  all  states  communicate  with  each  other. 

•  A  closed  set  of  states  is  a  set  of  states  which  never  leads  to  states  which  are 
outside  of  this  set. 
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•  An  absorbing  state  is  a  state  which  does  not  lead  to  any  other  states:  It  forms 
itself  a  closed  set.  An  absorbing  state  can  be  reached  from  the  outside  but  there 
is  no  escape  from  it. 

•  A  state  is  referred  to  as  transient  if  the  probability  of  returning  to  the  state  is  less 
than  one. 

•  A  state  is  referred  to  as  recurrent  if  the  probability  of  returning  to  the  state  is 
equal  to  one. 

•  Furthermore,  we  call  a  state  positive  recurrent  if  the  expectation  value  of  the 
first  return  time  is  less  than  infinity  and  null  recurrent  if  it  is  infinity.  We  may 
formulate  this  in  a  more  mathematical  language:  The  time  of  first  return  to  state 
i  is  defined  via 


Tt  =  inf  (n  >  1  :  Xn  =  i \X0  =  i )  .  (16.64) 


The  probability  that  we  return  to  state  i  for  the  first  time  after  n  steps  is  defined  as 


fS  =  Pr (Ti  =  n)  .  (16.65) 

Hence,  a  state  is  referred  to  as  recurrent  if 

Fi  =  =  1  .  (16.66) 

n 

positive  recurrent  if 

(Ti)  =  J2  <i  <  °° .  (16.67) 

n 

and  null  recurrent  if 

(Ti)=J2<  =  °o-  (16.68) 

n 


We  note  that  we  also  have  (7))  =  00  if  state  i  is  transient.  Furthermore,  one  can 
show  that  a  state  is  only  recurrent  if 

P'i  =  00  •  (16.69) 

n 

•  A  state  is  referred  to  as  periodic  if  the  return  time  of  the  state  can  only  be  a 
multiple  of  some  integer  d  >  1 . 

•  A  state  is  referred  to  as  aperiodic  if  d  —  1 . 

•  We  call  a  state  ergodic  if  it  is  positive  recurrent  and  aperiodic. 

•  A  MARKOV-chain  is  called  ergodic  if  all  its  states  are  ergodic. 
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We  give  some  useful  theorems  in  the  context  of  the  above  definitions:  First  of 
all,  it  can  be  proved  that  if  a  MARKOV-chain  is  irreducible  it  follows  that  either  all 
states  are  transient,  or  all  states  are  null  recurrent,  or  all  states  are  positive  recurrent. 

Furthermore,  a  theorem  by  KOLMOGOROV  states  that  if  a  MARKOV-chain  is 
irreducible  and  aperiodic  then  the  limit 

Ttj  —  lim  Ti-n}  —  7 — r  ,  (16.70) 

7  '  {Tj) 

exists.  It  follows  from  the  above  discussion  that  if  all  states  j  are  transient  or  null 
recurrent  we  have 


itj  =  0  ,  (16.71) 

and  if  all  states  j  are  positive  recurrent,  we  have 

Ttj  ^  0  .  (16.72) 

In  this  case  the  state  vector  n  —  {ttj}  is  referred  to  as  the  stationary  distribution  or 
equilibrium  distribution.  We  note  that  in  this  context  the  term  equilibrium  does  not 
mean  that  nothing  changes,  but  that  the  system  forgets  its  own  past.  In  particular,  as 
soon  as  the  system  reaches  the  stationary  distribution,  it  is  independent  of  the  initial 
state  7 r(0). 

We  concentrate  now  on  equilibrium  distributions.  It  follows  from  Eq.  (16.62)  that 
tc  satisfies: 


7i  —  7 tP  .  (16.73) 

Thus,  7T  is  the  left-eigenvector  to  the  transition  probability  matrix  P  with  eigenvalue 
1.  We  note  that  Eq.  (16.73)  states  a  homogeneous  eigenvalue  problem:  The  solution 
is  only  determined  up  to  a  constant  multiplicator  (see  Sect.  8.3).  However,  it  is  clear 
that  the  vector  n  satisfies 


yy  jTj  =  i .  (16.74) 

j 

One  can  prove  that  the  unique  solution  of  the  eigenvalue  problem  (16.73)  together 
with  the  normalization  condition  (16.74)  for  n  states  can  be  written  as 

n  =  e-  (P-E-iy1  ,  (16.75) 

where  e  is  an  ^-element  row  vector  containing  only  ones,  E  is  a  n  x  n  matrix 
containing  only  ones  and  I  is  the  n  x  n  identity. 

Let  us  briefly  elaborate  on  this  point:  if  it  is  possible  to  construct  a  MARKOV- 
chain  which  possesses  a  unique  stationary  distribution,  we  know  that  it  will 
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definitely  reach  this  distribution  independent  of  the  choice  of  initial  conditions.  The 
existence  as  well  as  the  form  of  the  stationary  distribution  is  clearly  determined 
by  the  transition  probabilities  pij.  A  sufficient  condition  for  a  unique  stationary 
distribution  to  exist  is  the  requirement  of  reversibility.  A  MARKOV-chain  is  referred 
to  as  reversible  if 


PijTii  =  pjiitj  ,  WiJ,  (16.76) 

i.e.  if  the  transition  probabilities  ensure  detailed  balance  for  the  stationary  distribu¬ 
tion  it. 

Now  we  are  in  a  position  to  understand  better  why  detailed  balance  was 
such  an  important  concept  of  the  Metropolis  algorithm  discussed  in  Sect.  14.3: 
Invoking  the  detailed  balance  condition  ensures  that  for  all  possible  initial  states  the 
MARKOV-chain  converges  toward  the  equilibrium  distribution  for  which  detailed 
balance  is  fulfilled.  Of  course,  the  convergence  time  will  highly  depend  on  the 
choice  of  the  initial  state  as  well  as  on  the  choice  of  the  transition  matrix.  Hence, 
we  can  generate  random  numbers  with  the  help  of  such  a  MARKOV-chain  and  after 
a  thermalization  period  these  numbers  will  follow  the  required  pdf.  Methods  based 
on  this  concept  are  commonly  referred  to  as  Marko \ -chain  Monte  Carlo  sampling 
methods  [23-26]. 

We  give  a  brief  example,  the  spread  of  a  rumor.  Let  Z\  and  Z2  be  two  distinct 
versions  of  a  report.  If  a  person  receives  report  Z\  it  will  pass  this  report  on  as  Z\ 
with  probability  (1  —p)  or  as  Z2  with  probability  p.  An  alternative  is  that  the  person 
receives  Z2  and  passes  it  on  as  Z2  with  probability  (1  —  q)  or  modifies  it  to  Z\  with 
probability  q.  We  summarize 

•  Pr(Zi  — >  Z\)  =  (\  —  p)  =  pn  , 

•  Pr(Zi  — >  Z2)  =  p  =  pn  , 

•  Pr(Z2  Zi)  =  q  =  P21  , 

•  Pr(Z2  — >  Z2)  =  (1  —  q)  =  P22  ■ 

The  transition  matrix  is  of  the  form 


r=('"V  ) - 

V  1  -q) 


(16.77) 


We  note  that  the  two  states  communicate  with  each  other  Z\  Z2,  hence  the 
MARKOV-chain  is  irreducible.  Furthermore,  since  the  process  can  reach  either 
state  Z\  or  Z2  within  a  single  time  step,  it  is  clearly  aperiodic.  Let  us  briefly 
investigate  the  probabilities  of  first  recurrence/-”  after  n  steps.  Due  to  the  theorem  by 
Kolmogorov  it  is  sufficient  to  investigate  the  state  Z\  since  the  MARKOV-chain  is 
irreducible  and  it  follows  that  also  Z2  has  the  same  recurrence  properties.  We  note 
the  following  possible  paths  for  a  first  return  to  state  Z\ : 
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•  3  :  Pr  (Zi  -+Z2^Z2^Zl)=p(l- q)q  =  fn  , 

•  n  :  Pr (Z,  -*  Z2  ->  •  •  •  -*  Z2  ->  Zt)  =  p(l  -  q)n~2q  =/"  . 

The  probability  of  returning  to  Z\  is,  see  Eq.  (16.66), 


*  =  E/n 

n=  1 


oo 

=  (i  ~p)  +pq'Z,(i-q)n 

n= 0 


=  (1  —  p)  +pq 


l 

1-0-9) 


(16.78) 


where  we  employed  that  0  <  (1  ~q)  <  1  as  well  as  the  convergence  of  the  geometric 
series.  Hence  state  Z\  is  recurrent  and,  therefore,  also  state  Z2.  We  calculate  the 
expectation  value  of  the  first  return  time  (7)): 


00 


m)  =  y>/i 


n 


n=  1 


00 


=  (1  - P )  +  2)(!  -9) 


n 


n= 0 


00 


00 


=  (i  -p)  +  2 p<?Eo 


n 


n=0 

's 


/?=() 


_  1 

9 


OO 


1  +P-P9(1  -9)— E^1  _^) 

^  n=0 


n 


S - v - ' 


1  +p  +  -(1  -9) 

q 

P+_q 

q 


(16.79) 


Hence,  the  states  Zi  and  Z2  are  positive  recurrent  as  long  as  p  ^  0  and  q  ^  0.  This 
means  that  an  equilibrium  distribution  exists  and  it  can  be  obtained  from  Eq.  (16.70). 
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We  have 


1  q 


(Ti)  P  +  q ' 

Due  to  the  normalization  condition  (16.74)  we  obtain 


zr  2  —  1  —  TC  \  —  -  , 

p  +  q 


and,  therefore, 


p  +  q 
P 


(16.80) 


(16.81) 


(16.82) 


Since  all  states  are  positive  recurrent  and  aperiodic,  the  above  MARKOV-chain  is 
ergodic.  Finally,  we  remark  that  this  example  also  fulfills  detailed  balance  since 


x\pn  =  -  =  Tt2p2\  •  (16.83) 

p  +  q 

Let  us  briefly  interpret  this  example:  Suppose  the  original,  true  version  Z\  of 
a  report  is  ‘ Mr  X  is  going  to  resign  while  Z2  is  just  the  opposite:  ‘ Mr.  X  is  not 
going  to  resign  .  The  property  of  irreducibility  of  the  MARKOV-chain  reflects  the 
fact  that  there  is  no  version  of  the  report  which  cannot  be  reached  or  modified. 
Moreover,  we  just  demonstrated  that  the  process  is  positive  recurrent:  Even  if  the 
probability  p  that  Z\  was  modified  to  Z2  is  very  small  and  the  probability  q  that  Z2 
was  modified  to  Z\  is  very  high,  the  report  will  infinitely  often  return  to  version  Z2 
with  probability  one.  This  means  that  the  public  will  be  told  infinitely  often  that 
Mr.  X  is  not  going  to  resign  with  probability  one.  The  equilibrium  probabilities  it\ 
and  tz2  display  the  asymptotic  probability  of  versions  one  and  two  of  the  report, 
respectively.  However,  as  has  already  been  emphasized,  this  does  not  mean  that 
the  report  cannot  be  modified  in  equilibrium,  it  simply  displays  the  fact  that  the 
probabilities  reached  a  steady  state.  Finally,  we  note  an  interesting  effect  in  passing: 
Suppose  that  the  probabilities  that  any  of  the  two  versions  is  modified  is  very  small 
but  equal,  i.e.  p  —  q  1 .  Then  the  equilibrium  distribution  is 


Tt\  —  tc2  —  —  ,  (16.84) 

and  the  public  will  believe  Z\  and  Z2  with  the  same  probability  after  some  time 
independently  of  the  initial  version  and  also  independent  of  the  actual  decision  of 
Mr.  X.  Detailed  balance  expresses  the  property  that  the  probability  of  receiving  Z2 
and  passing  it  on  as  Z\  is  the  same  as  the  probability  of  receiving  Z2  and  passing  it 
on  as  Z\ . 
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We  close  this  section  with  a  final  remark:  It  is  an  easy  task  to  generalize  the 
ideas  of  MARKOV-chains  to  continuous  state  spaces  since  we  already  introduced 
the  required  tools  in  Sect.  16.3.  Let  n(x)  denote  the  stationary  distribution  density 
and  p(x\y)  the  accompanying  transition  rate  pdf.  Then  relation  (16.73)  transforms 
into 


together  with 


7t(x)  =  J  dyjr(y)p(x\y) 


/ 


cll7T  (x)  —  1 


(16.85) 


(16.86) 


the  usual  normalization  of  pdfs.  In  this  case,  the  condition  of  detailed  balance  is 
given  by 


7t(x)p(y\x)  =  Jt(y)p(x\y)  , 
which  is  equivalent  to  Eq.  (16.52). 


(16.87) 


16.5  Continuous-Time  MARKOV-Chains 

A  generalization  of  the  results  of  the  previous  sections  to  a  continuous  time  span 
is  straight-forward.  We  define  the  continuous-time  MARKOV-chain  as  a  time- 
homogeneous  Markov  process  on  a  discrete  state  space  but  with  a  continuous 
time  span,  t  >  0.  Thus 


Pr [X(t  +  s)  =  n\X(s)  =  m\  =  pnm(t )  ,  (16.88) 

is  independent  of  s  >  0.  In  this  case  the  transition  matrix  P(t)  —  {pij(t}}  is  an 
explicit  function  of  time  t.  Its  elements  pnm(t)  have  the  following  four  properties: 

(a)  All  matrix  elements  pnm(f)  of  the  transition  matrix  P  are  positive: 

Pnm(t )  >  0  ,  Vf  >0.  (16.89) 

(b)  The  usual  normalization  of  the  rows  of  the  transition  matrix  P  is  valid: 

^  ^  Pnm  (t )  —  1  , 


m 


V«  and  t  >  0  . 


(16.90) 
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(c)  As  for  every  MARKOV  process,  the  transition  matrix  of  the  continuous  time 
MARKOV-chain  obeys  the  Chapman-Kolmogorov  equation: 

Y,pnk(t)pkm(t>)  =  pnm(t  +  /)  ,  (16.91) 

k 


which  can  alternatively  be  expressed  as 

P(t  +  t')  =  P(t)P(t')  . 


(16.92) 


(d)  We  assume  pnm(t )  to  be  a  continuous  function  of  t  and  that: 


lim  p„m(t) 


1  for  n  —  m  , 

0  for  n  ^  m  . 


(16.93) 


It  follows  from  this  equation  that  the  matrix  elements  pnm(t)  can  be  written  as 


1  +  qnnt  +  & ( t 2)  for n  —  m  , 
Qnm  t  +  @ ( t 2)  for  n  ^  m  , 


(16.94) 


where  we  introduced  with  {qnm}  =  Q  the  transition  rate  matrix.  The  transition  rate 
matrix  Q  obeys: 

(a)  All  off-diagonal  elements  qnm ,  n  ^  m,  are  non-negative  since 


qnm  —  lim 


P  nm  (0 


0 


>0  for  ft  ^  m  . 


(16.95) 


(b)  All  diagonal  elements  qnn  are  non-positive  since 

qnn  —  —  lim - ^  <  0  .  (16.96) 

/-*o  t 

(c)  Differentiating  Eq.  ( 1 6.90)  with  respect  to  t  yields  that  the  sum  over  all  elements 
in  a  row  is  equal  to  zero.  Therefore,  we  conclude: 


m 

n^m 


(16.97) 


Moreover,  differentiating  the  Chapman-Kolmogorov  equation  with  respect  to  t 
or  f'  gives  the  Kolmogorov  forward  -  or  Kolmogorov  backward  equations 


Pit)  =  P(t)Q  and  P(t)  =  QPit)  , 


(16.98) 
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respectively.  We  obtain  P(t )  =  exp  ( Qt )  where  the  exponential  function  of  a  matrix 
has  to  be  interpreted  as 


00  tk 

exp  (Qt)  =  Js  Qk  • 

k  =  0 


(16.99) 


where  Q°  —  I  is  the  identity  matrix. 

We  define  s  as  the  time  of  the  first  jump  of  our  process  for  the  particular  case 
X(0)  =  n 


s  =  min  [f|X(f)  ^  X(0)]  .  (16.100) 

It  can  be  shown  that  Pn(s  >  t ),  the  probability  that  the  jump  occurs  at  some  time 
s  >  t,  is  given  by 


Pn(s  >  t)  =  exp (qmt)  ,  (16.101) 

where  we  note  that  qnn  <  0.  Moreover, 

Pn[X(s)  =  m\  =  ,  (16.102) 

qnn 

and  the  process  starts  again  at  time  s  and  in  state  m.  This  means  that  in  a 
continuous-time  MARKOV-chain  the  waiting  times  between  two  consecutive  jumps 
are  exponentially  distributed.  One  of  the  simplest  examples  of  a  continuous  time 
MARKOV-chain  is  the  POISSON  process,  discussed  in  Sect.  16.3. 


Summary 

This  chapter  introduced  the  concept  of  stochastic  processes  Yx(t)  as  ‘time’  depen¬ 
dent  processes  depending  on  randomness.  Y  was  a  random  variable  which  depended 
on  another  random  variable  X  and  f,  the  time.  All  realizations  of  Yx(t)  spanned 
the  state  space.  Each  stochastic  process  was  coupled  to  a  pdf  which  described  the 
probability  that  the  process  took  on  the  realization  y  at  time  t.  In  the  course  of 
this  introduction  a  series  of  general  properties  which  classify  such  processes  were 
defined.  This  was  followed  by  the  discussion  of  a  particular  class  of  stochastic 
processes,  the  Markov  processes.  They  had  the  remarkable  property  that  a  future 
realization  of  the  process  solely  depended  on  its  current  realization  and  not  on 
the  history  how  this  current  realization  had  been  reached  (Markov  property).  A 
huge  class  of  processes  in  physics  and  related  sciences  is  Markovian  in  nature. 
The  next  refinement  in  our  discussion  was  the  introduction  of  MARKOV-chains. 
These  were  processes  defined  on  a  discrete  time  span  and  in  a  discrete  state  space. 
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This  allowed  to  replace  the  pdfs  by  probabilities.  Again,  various  specific  properties 
of  MARKOV-chains  opened  the  possibility  of  a  distinctive  classification.  A  very 
important  observation  was  that  under  certain  conditions  a  Markov -chain  reached  a 
stationary  or  equilibrium  distribution  and  that  it  definitely  arrived  at  this  distribution 
independent  of  the  choice  of  initial  conditions.  Moreover,  detailed  balance  was 
obeyed  by  this  equilibrium  condition.  This  observation  was  the  backbone  of 
MARKOV-chain  Monte  Carlo  sampling  methods,  in  particular  of  the  METROPOLIS 
algorithm.  Finally,  continuous-time  MARKOV-chains  were  discussed. 


Problems 

1.  Write  a  program  to  simulate  the  Wiener  process  in  one  dimension.  Follow  the 
method  explained  in  Sect.  16.2  and  perform  the  following  analysis: 

a.  Illustrate  graphically  some  typical  sample  paths. 

b.  Calculate  the  mean  (x(t))  and  the  variance  var[x(f)]  by  restarting  the  process 
several  times  with  different  seeds  and  plot  the  result. 

c.  Measure  the  position  x  of  the  particle  at  a  particular  time  t  for  several  runs 
(with  different  seeds)  and  illustrate  the  result  p(x,  t)  graphically. 

2.  Realize  numerically  a  POISSON  process  according  to  the  instructions  given 
in  Sect.  16.2.  Again,  plot  some  typical  sample  paths.  Moreover,  calculate  the 
mean  waiting  time  (r)  as  well  as  the  variance  var(r)  numerically  as  well  as 
analytically. 
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Chapter  17 

The  Random  Walk  and  Diffusion  Theory 


17.1  Introduction 


Diffusion  is  one  of  the  most  widely  spread  processes  in  science.  Its  occurrence 
ranges  from  random  motion  of  dust  particles  on  fluid  surfaces,  historically  known 
as  Brownian  motion,  to  the  motion  of  particles  in  numerous  physical  systems  [1,2], 
the  spreading  of  malaria  by  migration  of  mosquitoes  [3],  or  even  to  the  description 
of  fluctuations  in  stock  markets  [4] . 

For  instance,  let  us  regard  N  neutral,  identical,  classical  particles  which  solely 
interact  through  collisions,  for  instance  an  H2-gas  in  a  box,  where  N  —  Na  ^ 
6.022  x  1023.  We  are  interested  in  the  dynamics  of  one  particle  under  the  influence 
of  all  others  and  under  no  influence  by  an  external  force;  we  expect  that  diffusion 
will  be  the  dominating  process.  From  the  microscopic  point  of  view  such  a  situation 
can  be  described  with  the  help  of  N  coupled  Newton’s  equations  of  motion.  (See 
Chap.  7.)  Anyhow,  such  a  task  will  not  be  feasible  due  to  the  size  of  the  system 
-  the  magnitude  of  N.  However,  a  statistical  description  can  be  obtained  from 
Boltzmann’s  equation  [5] 


d  d 

—f(r.  r],t)  =  —f(r,  q,  t ) 
at  at 


coll. 


(17.1) 


wher e/(r,  77,  t)  is  the  phase  space  distribution  function.  Hence, /(r,  77,  t)drdr]  is  the 
number  of  particles  of  momentum  77  within  the  phase- space  volume  drd?7  which  is 
centered  around  position  r  at  time  t.  We  have,  in  particular: 


4/0%  n,t)  +  -.  2 f(r ,  T),  t)  +  F  •  2/(r>  Tjt  t)  =  C[f](r,  q,  t) 

dt  m  ar  ar) 


(17.2) 


Here  C[/](r,  77,  t)  is  the  collision  integral  and  F  describes  an  external  force.  In  cases 
where  collisions  result  solely  from  two-body  interactions  between  particles  that 
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are  assumed  to  be  uncorrelated  prior  to  the  collision,1  the  collision  integral  can  be 
described  by 


C[f\(r,rj,t) 


J  d£i  J  d^2  J  d£3g(£i>  £2,  £3*  *7)  I/Xr> $ut)f(r,$2,i) 

-f(r,rj,t)f(r,&,t)]  ,  (17.3) 


where  g(£i,  £2,  £3,  rj)  accounts  for  the  probability  that  a  collision  between  two 
particles  of  initial  moments  £1  and  £2  and  final  momenta  £3  and  rj  occurs.  This 
function  depends  on  the  particular  type  of  particles  under  investigation  and  has,  in 
general,  to  be  determined  from  a  microscopic  theory.2 3  We  now  define  the  particle 
density  p(r,  t)  as  a  function  of  space  r  and  time  t  via 


d  rjf(r,  rj ,  t)  . 


(17.4) 


A  complicated  mathematical  analysis  of  Eq.  (17.1)  results  in  a  diffusion  equation  of 
the  well-known  form 


3  32 

— p(r,  t)  =  D—p(r,  t)  ,  (17.5) 

if  collisions  dominate  the  dynamics  ( diffusion  limit).  Here  D  =  const  is  the  diffusion 
coefficient  of  dimension  length2  x  time-1.  Note  that 

J  drp(r,  t)  —  N  ,  (17.6) 

is  the  number  of  particles  within  our  system.  Thus,  in  our  example  we  can  interpret 
diffusion  as  the  average  evolution  of  the  integrated  phase  space  distribution  function 
governed  by  collisions  between  particles.  Such  an  interpretation  will  certainly  not 
hold  in  the  case  of  fluctuations  in  stock  markets  or  in  the  case  of  the  spreading  of 
malaria  because  typically  mosquitoes  do  not  collide  with  humans. 

It  is  the  aim  of  the  first  part  of  this  chapter  to  present  a  purely  stochastic 
approach  to  diffusion,  the  so  called  random  walk  model  [7,  8].  This  stochastic 


!This  assumption  is  known  as  the  approximation  of  molecular  chaos.  In  fact  it  represents  the 
Markov  approximation  to  the  dynamics  of  a  many  particle  system. 

2 For  instance,  one  can  employ  Fermi’s  golden  rule  [6]  to  obtain  this  function  on  a  quantum 
mechanical  level.  We  already  came  across  an  expression  of  the  form  (17.3)  on  the  right  hand  side  of 
the  master  equation,  see  Sect.  16.3,  Eq.  (16.42).  However,  the  collision  integral  of  the  Boltzmann 
equation  is  non-linear. 

3  The  function  p(r,  t)  is  referred  to  as  a  physical  distribution  function  due  to  the  normalization 
condition  (17.6).  This  is  in  contrast  to  distribution  functions  we  encountered  so  far  within  this 
book,  which  are  normalized  to  unity. 
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description  will  prove  to  have  several  precious  advantages:  (i)  We  will  be  able  to 
identify  criteria  for  the  validity  of  the  diffusion  model  even  for  systems  lacking  a 
straight-forward  physical  interpretation,  (ii)  The  stochastic  formulation  will  give  us 
the  opportunity  to  perform  diffusion  ‘experiments’  on  the  computer  without  much 
computational  effort  as  the  methods  employed  are  based  on  algorithms  discussed  in 
previous  chapters,  (iii)  Within  this  framework  it  will  be  an  easy  task  to  generalize  the 
approach  to  stochastic  models  of  anomalous  diffusion  [9]:  The  fractal  time  random 
walk  and  Levy  flight  models  [10].  These  models  play  an  increasingly  important 
role  in  modern  statistical  physics. 


17.2  The  Random  Walk 

The  random  walk  is  one  of  the  classical  examples  of  MARKOV-chains  [11-13]. 
In  this  section  we  discuss  some  of  the  basic  properties  of  random  walks  in  one 
dimension.  For  convenience,  we  are  going  to  use  the  familiar  picture  of  one  diffusing 
particle. 


Basics 

The  random  walk  [8]  is  defined  as  the  motion  of  a  single  particle  which  moves  at 
the  time  instances 


0,  At,  2 At, . . .  ,nAt, 


(17.7) 


between  grid-points 


,  —nAx, . . . ,  —Ax,  0,  Ax, . . . ,  nAx, 


(17.8) 


For  a  more  transparent  notation  the  lattice  point  nAx,  with  n  e  Z,  will  be  denoted 
by  xn  and  the  instance  kAt,  with  k  e  N,  will  be  denoted  by  4.  This  notation  follows 
the  conventions  of  Chap.  2.  The  initial  position  is  given  by 


Pr[X(f0  =  0)  =  xi\  =  8i0  ,  (17.9) 

and  the  transition  rates  py  from  position  i  to  position  j  within  a  single  time  step  At 
are  defined  as 


Pr[X(f„+i)  =  xt\X(tn)  =  xj]  =  pSji- 1  +  qSji+i  +  r<%  .  (17.10) 

Here  p  denotes  the  probability  that  the  particle  jumps  to  the  neighboring  grid-point 
on  the  right-hand  side,  q  stands  for  the  probability  that  the  particle  jumps  to  the 
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neighboring  grid-point  on  the  left-hand  side,  and  r  denotes  the  probability  of  staying 
at  the  same-grid  point  within  this  time  step.  Naturally,  we  have 


p  +  q  +  r  =  1  .  (17.11) 

Consequently,  we  have  a  MARKOV-chain  with  time  instances  tn  and  a  state  space 
spanned  by  the  positions  x^.  Moreover,  we  note  that  the  stochastic  process  is  clearly 
irreducible  since  all  states  communicate  with  each  other  (see  Sect.  16.4).  Hence,  it 
follows  that  either  all  states  are  recurrent  or  all  states  are  transient.  Furthermore,  in 
the  case  that  r  ^  0  the  MARKOV-chain  is  aperiodic,  otherwise  the  chain  is  periodic 
with  periodicity  d  —  2  because  it  takes  at  least  two  steps  to  return  to  the  starting 
position. 

We  concentrate  first  on  the  classical  random  walk  that  is  a  one-dimensional 
random  walk  with  At  —  Ax  —  1,  r  =  0,  and  p  +  q  —  1.  This  ensures  that  the 
probability  of  remaining  in  the  actual  position  within  one  time  step  is  equal  to  zero. 
If,  furthermore,  p  —  q  —  1/2  the  random  walk  is  referred  to  as  unbiased  and  for 
p  ^  <7  we  call  it  biased.  We  write  the  position  X(tn)  =  xn  at  time  tn  —  n  as 

n 

(17.12) 

i=  1 

where  ^  e  {—1,1}  and  Pr(^  =  +1)  =  p,  Pr (£;  =  —1)  —  q.  Let  us  assume  that 
within  these  n  steps  the  particle  moved  m  times  to  the  right  and,  consequently,  n  —  m 
times  to  the  left.  The  actual  position  xn  after  n  steps  can  then  be  determined  from 

xn  —  m  —  (n  —  m)  —  2m  —  n  =  k  ,  (17.13) 


where  we  used  that  Vo  =  0.  It  is  interesting  to  calculate  the  probability  Pr(xn  —  k ) 
to  find  the  particle  after  n  time  steps  at  some  particular  position  k.  This  is  simply  the 
sum  over  all  paths  along  which  the  particle  moved  m  =  (n  +  k)  /  2  times  to  the  right 
and  n  —  m  —  (n  —  k)/2  times  to  the  left  multiplied  by  the  probability  for  m  steps 
to  the  right  and  n  —  m  steps  to  the  left.  In  total,  this  yields  (/J  =  ((7?+7)  )//9)  different 
contributions  and  we  have 


Pr(x„  =k)=\U  \pmqn-n 

m 


n 


(n  +  k)/2 


n-\-k  n — k 

I  p  2  q  2 


(17.14) 


In  particular,  we  find  for  the  unbiased  random  walk: 


0 n  +  k)/l)  (2) 


Pi'(a„  =  k)  = 


n 


(17.15) 
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Due  to  the  aperiodicity  of  the  classical  random  walk,  k  can  only  take  on  the  values 
k  —  —n,  —  n  +  2, . . . ,  n  —  2,  n.  Consequently  n  ±  k  has  to  be  even.  For  all  other  values 
of  k  we  have  Pr(vw  —  k)  =  0.  Furthermore, 


X  Pr(A» 

k=—n 
n  even 

=  1  ,  (17.16) 

and  the  probability  of  finding  the  particle  at  time  n  within  [—n,  n]  is  equal  to  one.  A 
simple  algorithm  to  simulate  the  one-dimensional  biased  random  walk  consists  of 
the  following  steps: 

1.  Define  values  xo,  p ,  and  q  —  1  —p  . 

2.  Draw  a  uniformly  distributed  random  number  re  [0,1]  . 

3.  If  r  <  p  set  xn+\  =  xn  +  1  ,  otherwise  set  =  xn  —  1  . 

4.  Return  to  step  2. 

In  Fig.  17.1  we  present  three  different  realizations  of  an  unbiased  one¬ 
dimensional  random  walk  for  (a)  N  =  50,  (b)  N  —  100,  and  (c)  N  —  1000 
consecutive  steps. 

Comparison  between  Figs.  17.1  and  16.2  already  suggests  a  connection  between 
the  random  walk  and  the  Wiener  process  and  we  shall  come  back  to  this  point  in 
the  course  of  this  chapter. 


n 


=  *)  =  X 


m=0 


n 


m 


\pmqn~m 


=  (p  +  q) 


n 


Moments 

Let  us  briefly  elaborate  on  the  moments  of  the  random  walk  (see  Appendix, 
Sect.  E.2).  The  first  moment  or  expectation  value  (xn)  is  given  by 

n 

(*«>  =  X  k 

k=—n 
n±k  even 

=  X(2  m-«)r  W“m 
m= 0  V  / 

=  (2  (m)  —  n ) 

=  n(2p  —  1)  . 


n 


(n  +  k)/ 2 


0 n+k)/2q{n-k)/2 


(17.17) 
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t 

Fig.  17.1  Three  different  realizations  of  an  unbiased  one-dimensional  random  walk  for  (a)  N  = 
50,  (b)  N  =  100,  and  (c)  N  =  1000  time  steps  and  different  seeds 


We  now  introduce  a  bias  v  such  that 

P=\(.  1  +  u)  and  (17.18) 

and  obtain 

(xn)  —  nv  .  (17.19) 

We  calculate  the  second  moment  (x2q)  using  the  above  method  and  get: 

(xl)  =  n(  1  —  v2)  +  n2v2  .  (17.20) 

The  variance  var  (xn)  follows  immediately: 

var  (x„)  =  (x2n)  -  (. xn )2  =  n(  1  -  v2)  .  (17.21) 

We  note  the  following:  The  expectation  value  (xn)  moves  according  to  Eq.  (17.19) 
with  a  uniform  velocity  defined  by  the  bias  v  —  p  —  q.  In  particular,  for  the  unbiased 
random  walk  v  =  0  and,  thus,  (. xn )  —  0  for  all  n.  Furthermore,  we  observe  that 
var  (jtn)  increases  linearly  with  time  n  -  a  property  we  already  noted  for  the  Wiener 


17.2  The  Random  Walk 


277 


process  in  Sect.  16.3  -  and  it  is  maximal  for  v  —  0.  For  v  —  ±1,  which  describes 
a  pure  drift  motion  in  the  positive  or  negative  v  direction,  the  variance  is  equal  to 
zero. 


Recurrence 


Let  us  briefly  investigate  the  recurrence  behavior  of  the  random  walk.  We  are 
interested  in  the  probability /00  of  a  first  return  to  the  origin  vo  =  0  after  21  steps. 

We  already  know  that  oc  plql  from  our  previous  analysis.  In  the  very  first  time 
step  the  particle  moves  either  to  x\  —  1  or  to  x\  —  —  1  and,  consequently,  within  the 
following  21  —  2  steps  it  must  not  cross  or  touch  the  line  Xk  —  0  and  the  particle  has 
to  terminate  at  position  xm-\  —  xi  -  Therefore,  the  walker  performs  l  —  1  steps  to 
the  left  and  l  —  1  steps  to  the  right  within  these  21  —  2  steps.  The  total  number  of 
possible  paths  N  from  x\  to  X2i-\  —  x\  is,  thus,  given  by 


L-V' 


(17.22) 


Moreover,  N  may  also  be  written  as  the  sum  of  Nc  paths  which  cross  or  touch  the 
line  Xk  —  0  and  Nnc  paths  which  do  not  cross  or  touch  the  line  Xk  —  0,  i.e. 

N  =  Nc+Nnc  .  (17.23) 

Obviously,  we  are  only  interested  in  the  paths  which  do  not  cross  or  touch  the  line 
Xk  =  0.  We  employ  the  reflection  principle  to  solve  this  problem.  In  general,  the 
number  of  paths  which  go  from  x\  —  i  >  0  to  —  j  >  0  within  k- steps  and 
cross  the  line  xi  —  0  is  equal  to  the  total  number  of  paths  which  go  from  x\  —  —i  to 
Xk+\  —  j ,  as  is  schematically  illustrated  in  Fig.  17.2. 

Let  us  regard  the  case  x\  —  1 :  The  walker  moved  in  the  first  step  to  the  right. 
Hence,  from  the  reflection  principle  we  obtain  that  the  number  of  paths  from  x\  to 
xu-2  —  x\  in  21  —  2  steps  which  cross  or  touch  the  line  Xk  —  0  is  given  by  the  total 


Fig.  17.2  Illustration  of  the 
reflection  principle 
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number  of  paths  from  —x\  to  X21-2  —  x\.  Thus,  Nc  is  determined  by: 

Nc  — 

We  note  that  in  this  picture,  the  walker  moves  l  steps  to  the  right  and  1  —  2  steps  to 
the  left.  Hence,  we  obtain  that  the  number  of  paths  which  do  not  cross  or  touch  the 
line  Xfc  — —  0  is  given  by 


(21 -2\ 

V  1  ) 


2  Nnc  =  2(N  —  Nc) 

The  prefactor  2  accounts  for  the  fact  that  the  walker  can  move  in  its  first  step  either 
to  x\  =  —  1  or  to  x\  —  1 .  Thus,  the  probability  for  the  first  return  of  the  particle  after 
21  steps  is  described  by: 


1 


21 


2t-\\t 


(17.25) 


Mi)  _ 

7 00  — 


1 

2  i  -  1 


(17.26) 


We  calculate  the  recurrence  probability  according  to  Eq.  (16.66)  and  this  results 
in 


for  p  =  q  =  \  , 

for p  <  q  ,  (17.27) 

for  p  >  q  , 

with  the  consequence  that  the  one-dimensional  random  walk  is  only  recurrent  in  the 
unbiased  case  v  =  0. 

Another  possibility  to  demonstrate  the  recurrence  of  the  unbiased  one¬ 
dimensional  random  walk  is  provided  by  Eq.  (16.69).  The  probability  that  a  walker 
returns  to  xo  =0  after  In  steps  is  given  by 

P(2n)(x o)  =  (2n)pnqn  =  ^yjipqT  •  (17.28) 

\n  f  n\n\ 

In  this  case  we  are  not  interested  in  the  question  whether  or  not  it  is  the  particle’s 
first  return.  By  Stirling’s  approximation  [Appendix,  Eq.  (E.20)] 


n\  oc  nn+2e  n  Vln  , 


(17.29) 
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and  we  obtain  for  Pty2n\x o): 


(17.30) 


We  assume  p  <  1/2  and  since  pq  —  p(  1  —  p)  <  1/4  one  gets 


oo 


y>(2n)<A> 


oo  only  for  p  —  q  —  -  . 


(17.31) 


The  same  argument  holds  for  p  >  1/2  since  we  can  write  pq  —  (1  —  q)q  <  1/4. 
According  to  Eq.  (16.69)  this  means  that  the  process  is  recurrent  only  for  p  —  q,  in 
accordance  with  our  previous  result  (17.27),  and  transient  otherwise.  We  note  that 
this  agrees  also  with  the  more  physical  picture  of  an  external  force  inducing  a  bias 
or  drift  velocity  v  ^  0. 

It  should  be  noted  that  the  unbiased  random  walk  in  two  dimensions  is  also 
recurrent  while  it  can  be  proved  to  be  transient  in  higher  dimensions.  For  instance, 
the  recurrence  probability  is  approximately  0.344  in  3D. 


17.3  The  Wiener  Process  and  Brownian  Motion 

It  is  the  purpose  of  this  section  to  demonstrate  that  the  Wiener  process  is  the  scaling 
limit  of  the  random  walk.  Moreover,  we  discuss  briefly  the  L ANGEVIN  equation  and 
derive  the  diffusion  equation. 

As  a  starting  point  we  consider  the  one-dimensional  unbiased  random  walk  on  an 
equally  spaced  grid  according  to  Eq.  (17.8)  and  time  instances  given  by  Eq.  (17.7). 
We  denote  the  stochastic  process  by  Xn  —  X(tn)  and  it  is  described  by 


n 


(17.32) 


i=  1 


where  £  e  {— 1 , 1}  together  with  X0  =  0.  Since  we  regard  the  unbiased  case  Pr(£*  = 
dzl)  =  1/2,  (£/}  =  0,  and  var  (£/)  =  1.  This  is  equivalent  to 


( Xn )  =  0  and  var(Xn)  =  nAx2  , 


(17.33) 


4This  is  one  of  Polya’s  random  walk  constants  [14-16]. 


280 


17  The  Random  Walk  and  Diffusion  Theory 


as  we  already  demonstrated  in  the  previous  section,  Eq.  (17.21).  The  variance 
var  (Xn)  can  be  reformulated  as 


/\  x  2 

var  (Xn)  =  tn  — —  , 

At 


(17.34) 


using  the  definition  tn  =  nAt.  The  simultaneous  limit  At,  Ax  ->  0  is  now  performed 
in  such  a  way  that 


Ax2 

lim  -  —  D  —  const , 

At->  0  At 

Ax— >0 


(17.35) 


with  D  the  diffusion  coefficient.  This  limit  is  known  as  the  continuous  limit  and  it 
will  be  denoted  by  the  operator  Jzf Hence,  in  the  continuous  limit  Eq.  (17.34)  results 
in 


2zf  [var  (Xw)]  =  Dt  ,  (17.36) 

where  we  renamed  tn  =  t.  We  also  note  that  the  limit  At  ->  0  for  constant  t  is 
equivalent  to  n  ->  oo  and  we  obtain  in  accordance  with  the  central  limit  theorem 
(see  Appendix,  Sect.  E.8): 


2£f  (Xn)  -^Wt~  yK(0,  Dt)  .  (17.37) 

Here  yK(0 ,Dt)  denotes  the  normal  distribution  of  mean  zero  and  variance  Dt , 
Appendix  Eq.  (E.43).  Furthermore,  the  symbol  Wt  was  introduced  to  represent  the 
Wiener  process  and  the  symbol  stands,  within  this  context,  for  the  notion 
follows  the  distribution.  If  Wt  describes  a  Wiener  process  it  is  necessary  to  prove 
that  Wt  has  independent  increments  Wt2  —  Wt]  which  follow,  according  to  Sect.  16.3, 
a  normal  distribution  with  mean  zero  and  a  variance  proportional  to  t2  —  t\.  This  is 
demonstrated  quite  easily:  We  learn  from  our  discussion  of  the  random  walk  that 

n  m  n 

xn  -  xm  =  J2  &  -  J2  $«•  =  E  ft  -  ( 1 7-3g) 

i=  1  i=  1  i=m-\- 1 

and,  therefore,  Xn  —  Xm  and  Xm  —  X \  are  clearly  independent  for  n  >  m  >  k  and  it 
follows  that  also  Wt  —  Ws  and  Ws  —  Wu  are  independent.  Furthermore,  we  have 
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where  the  symbol  ‘  =  ’  stands  for  the  notion  to  follow  the  same  distribution  or  to  be 
distributionally  equivalent .  Therefore,  in  the  limit  f£  for  t  >  s 

W,-Ws  =  w,-s  ~  JT  [0,  D(t  -s)]  ,  ( 1 7.40) 

which  completes  the  proof.  We  note  that  the  particular  case  D  =  1  is  commonly 
referred  to  as  the  standard  Wiener  process.  We  remark  that  in  many  cases  the 
terms  Wiener  process  and  Brownian  motion  are  used  as  synonyms  for  a  stochastic 
process  satisfying  the  above  properties.  However,  strictly  speaking,  the  stochastic 
process  is  the  Wiener  process  while  Brownian  motion  is  the  physical  phenomenon 
which  can  be  described  by  the  Wiener  process. 

If  we  suppose  that  p  ^  q  then 


-S?((Xn))  =  vt,  (17.41) 

with  the  drift  constant  v,  describes  a  Wiener  process  with  a  drift  term 

-S?(X„)->  Wt  =  vt+Wt.  (17.42) 

This  process  behaves  like  Wt  with  the  only  difference  that  it  fluctuates  around  mean 
vt  instead  of  mean  zero.  Note  that  for  v  >  0  the  mean  (Wt)  increases,  while  for 
v  <  0  it  decreases  with  time  t. 

Another  interesting  property  of  the  Wiener  process  is  its  self- similarity.  In 
particular,  we  have  the  property  that  for  a  >  0 

Wt=a~iwat,  (17.43) 

with  the  consequence  that  it  is  completely  sufficient  to  study  the  properties  of  the 
Wiener  process  for  t  e  [0, 1]  to  know  its  properties  for  arbitrary  time  intervals. 
Relation  (17.43)  follows  from  the  fact  that  Wt  ~  yK(0,  Dt). 

Furthermore,  white  noise ,  r](t),  is  defined  as  the  formal  derivative  of  the  Wiener 
process  Wt  with  respect  to  time.  We  give  its  most  important  properties  without  going 
into  details5 : 


{rj(t))  =  0,  and  (r](t)r](s))  =  S(t  —  s)  .  (17.44) 


5  In  fact,  it  can  be  shown  that  Wt  is  non-differentiable  with  probability  one.  This  is  the  reason  why 
it  is  defined  as  the  formal  derivative  of  Wt.  Let  (p(t)  be  a  test  function  and  f(t)  an  arbitrary  function 
which  does  not  need  to  be  differentiable  with  respect  to  t.  Then  the  formal  derivative /(t)  is  defined 
by 


d  tf(t)<p(t) 


OO 


d tf(t)<p(t)  . 
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Fig.  17.3  Three  different 
realizations  of  the  standard 

14 

Wiener  process  with  drift 

12 

v  =  1  according  to 

Eq.  (17.42).  The  expectation 

10 

value  (x)  =  vt  of  the  process 
is  presented  as  a  dashed  line 
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White  noise  is  referred  to  as  Gaussian  white  noise  if  r](t)  follows  a  normal 
distribution. 

Figure  17.3  presents  three  different  realizations  of  the  standard  Wiener  process 
with  drift  according  to  Eq.  (17.42).  The  curves  in  this  figure  were  generated  using 
the  procedure  outlined  in  Sect.  16.3  in  connection  with  Fig.  16.2. 

Let  us  derive  the  diffusion  equation  from  the  random  walk  model.  The  probability 
Pr(v,  t)  of  finding  the  particle  at  time  t  at  position  v  is  expressed  by 

Pr(v,  t)  —  Pr(v,  t  —  At)r  +  Pr(v  —  Ax,  t  —  At)p 
+  Pr(v  +  Ax,  t  —  At)q 

—  Pr(v,  t  —  At) (l  —  p  —  q)  +  Pr(v  —  Ax,  t  —  At)p 

T  Pr(v  T  Ax,  t  —  At)q  ,  (17.45) 


where  we  made  use  of  relation  (17.11).  The  interpretation  of  this  equation  is 
straight-forward:  The  probability  to  find  the  particle  at  the  position-time  point  (x,  t) 
is  the  sum  of  three  terms.  The  first  term  describes  the  probability  that  the  particle 
arrived  already  at  position  v  in  the  previous  time  step  t—  At  and  that  it  will  stay  there 
during  the  next  time  step.  The  remaining  two  terms  describe  the  probability  that  the 
particle  arrived  at  position  v  —  Ax  (x  +  Ax)  in  the  previous  time  step  and  that  it  will 
move  one  step  to  the  right  (left)  in  the  next  time  step.  Each  particular  term  is  now 
expanded  into  a  Taylor  series  up  to  order  0(Ax2)  and  &(At),  respectively.  This 
requires  the  transition  from  a  discrete  to  a  continuous  state  space  and,  consequently, 
the  probabilities  Pr(-)  are  replaced  by  pdfs  p(-).  We  get 


p(x,  t)  =  (1  -p-q)  p(x,  t)  -  At 


dp(x,  t) 
dt 


+P 


pCr, ,)  -  -  aJP<X-  0 


dt 


dx 
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1  2  3 2p(x,t)-\ 
+  -Ax 

2 


+q 


dx 2 


dt 


dx 


1  .  2d2p(x,t)  1 

+  lAx  ~ 


(17.46) 


and  furthermore: 


dp(x,  t ) 
dt 


(p  —  q)Ax  dp(x ,  t)  (p  +  q)Ax2  d2p(x,  t) 


At 


dx 


2  At 


dx2 


(17.47) 


We  draw  the  continuous  limit  and  define  the  drift  constant 


v  =  2£? 


(p-q) 


Ax' 

At 


=  lim 

>0 

Z\x-^0 


(17.48) 


the  diffusion  constant 


D  =  2£? 


2n 


(p  +  ?) 


Z\x 


=  lim 


<p+5)^. 


z\^o  2  At 

Ax— >0 


(17.49) 


and  arrive  at  the  one-dimensional  diffusion  equation  with  drift  term: 


dp  (x,  t) 
dt 


3 P(x,t)  d2p(x,  t) 

—  v — - - h  D 


dx 


dx2 


(17.50) 


This  equation  is  referred  to  as  a  Fokker-Planck  equation  [17].  In  the  specific  case 
p  —  q  the  drift  term  disappears  and  we  obtain,  as  expected,  the  classical  diffusion 
equation 


d  d2 

g- p(x,t)  =  Da rjp(x’t)  ,  (17.51) 

which  we  solved  already  numerically  in  Chaps.  9  and  11.  It  follows  from  this 
discussion  that  the  position  of  a  diffusing  particle  can  be  described  as  a  stochastic 
process  where,  in  the  continuous  limit,  the  jump-lengths  follow  a  normal  distribu¬ 
tion.  In  addition,  we  know  from  our  discussion  of  continuous-time  Markov -chains 
in  Sect.  16.5,  that  the  waiting  times  between  two  successive  jumps  will  certainly 
follow  an  exponential  distribution.  These  insights  will  serve  as  a  starting  point  in 
the  discussion  of  general  diffusion  models  in  Sect.  17.4.  Moreover,  we  note  that  the 
anisotropy  of  the  jump-length  distribution  is  a  model  for  the  presence  of  an  external 
field  which  manifests  itself  in  a  drift  term. 
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An  alternative  approach  to  the  formal  description  of  Brownian  motion  goes  back 
to  L ANGEVIN.  He  considered  the  classical  equation  of  motion  of  a  particle  in  a  fluid 
which  reads 


v  =  -Pv  ,  (17.52) 

where  /3  denotes  the  friction  coefficient  and  we  set  the  particle’s  mass  m  equal  to 
one.  L ANGEVIN  argued  that  this  equation  may  only  be  valid  for  the  average  motion 
of  the  particle  which  corresponds  to  the  long  time  behavior  of  the  motion  of  massive 
particles.  However,  if  the  particle  is  not  heavy  at  all  its  trajectory  can  be  highly 
affected  by  collisions  with  solvent’s  molecules.  He  supposed  that  a  reasonable 
generalization  of  Eq.  (17.52)  should  be  of  the  form  [18] 

v  =  -Pv  +  F(t)  ,  (17.53) 

where  F(t)  is  a  random  force.  In  particular,  F(t)  is  a  stochastic  process  which 
satisfies 


{F{t))  =  0  and  {F{t)F{s))  =  A8(t  -  t')  ,  (17.54) 

where  A  is  a  constant  and  we  obtain 

F(t)  =  VArj(t)  .  (17.55) 

Equation  (17.53)  is  referred  to  as  the  L ANGEVIN  equation  and  it  is  the  prototype 
stochastic  differential  equation.  Based  on  the  definition  of  white  noise  r](t)  the 
L ANGEVIN  equation  can  be  rewritten: 

du  =  -fivdt  +  VXdW,  .  (17.56) 


The  solution  of  the  L ANGEVIN  equation  describes  a  stochastic  process  referred  to 
as  the  Ornstein-Uhlenbeck  process  [19].  This  process  is  essentially  the  only 
stochastic  process  which  is  stationary,  Gaussian  and  Markovian.  Its  master  equation 
is  a  Fokker-Planck  equation  of  the  form  [17] 

d  d  4  3^ 

—p(v,t)  =  P  —  vp(v,t)  +  -—p(v,t)  ,  (17.57) 

at  ov  2  dvz 


where  p(v,t)  is  the  pdf  of  the  Ornstein-Uhlenbeck  process.  If  the  initial 
velocity  vq  is  given  then  the  pdf  p(v,  t)  can  be  proved  to  be 


S 

fnA.  (l  —  e-2A 


(v  —  Voe  ^')2 
A  (1  -  e~2A 


P(V,t)  = 


exp 


(17.58) 
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It  is  possible  to  solve  the  Langevin  equation  (17.53)  analytically  with  the  result: 


v(t)  —  vo  exp  (— fit)  +  VA  f  d t'rj(t')  exp  [— —  /)] 

Jo 


(17.59) 


We  write  in  particular 


v(tn+ 1)  =  v(tn)  exp  (-pAt)  +  Zn  , 


(17.60) 


with  Zn  defined  as: 


r*  At 

Zn  =  Va  /  d t'rj(tn  +  t')  exp  [~/3(At  -  t')] 

Jo 


(17.61) 


Since  r](t)  was  assumed  to  be  Gaussian  white  noise,  Zn  can  be  proved  to  be  described 
by 


z„  ~  |  °,  A  [i  _  exp (— 2^zif)]|  ,  (17.62) 

which  offers  a  very  convenient  way  to  simulate  the  Ornstein-Uhlenbeck 
process.  This  particular  formulation  of  Brownian  motion  allows  to  model  this 
process  by  sampling  changes  in  the  velocity  Zn  from  the  normal  distribution  with 
mean  zero  and  the  variance  given  in  Eq.  (17.62).  The  walker’s  position  x(t)  can  then 
be  obtained  by  approximating  the  velocity  v  —  x  with  the  help  of  finite  difference 
derivatives,  as  described  in  Chap.  2.  In  conclusion  we  remark  that  although  the 
Langevin  equation  was  introduced  in  a  heuristic  manner,  it  represents  a  very  useful 
tool  due  to  its  rather  simple  interpretation. 

Figure  17.4  presents  three  different  realizations  of  the  Ornstein-Uhlenbeck 
process  based  on  three  different  initial  velocities  Vo.  The  corresponding  random 
trajectories  x(t)  of  the  Brownian  particle  are  illustrated  in  Fig.  17.5. 


17.4  Generalized  Diffusion  Models 

We  formulate  now  a  very  general  approach  to  diffusive  behavior  which  is  based  on 
continuous  random  variables.  We  start  with  the  introduction  of  the  pdf  A(x,  t).  Its 
purpose  is  to  describe  the  event  that  a  particle  arrives  at  time  t  at  position  v.  It  can 
be  expressed  as  [20,  21] 


A(x,t)  —  J  dx  J  &tr\j/{x,  t;x',  t')A(x',  t')  ,  (17.63) 

where  \//(x,t;x' ,t')  is  the  jump  pdf.  We  offer  the  following  interpretation: 
f  (v,  t ;  x' ,  t')  describes  the  probability  for  an  event  that  a  particle  which  arrived 
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Fig.  17.4  Three  different  realizations  of  the  Ornstein-Uhlenbeck  process  v(t)  vs  t.  For  this 
simulation  we  chose  =  1,  A  =  5,  dt  =  10-2  and  N  =  103  time  steps.  Furthermore,  we  chose 
three  different  initial  velocities,  i.e.  Vq  =  0  (black),  Vq  =  5  (gray)  and  Vq  =  10  (light  gray) 


Fig.  17.5  Random  trajectories  x(t)  vs  t  of  the  Brownian  particle  which  correspond  to  the  velocities 
v(t)  illustrated  in  Fig.  17.4  with  initial  position  =  0.  Note  that  we  used  for  this  figure  N  =  105 
time  steps 


at  time  tf  at  position  xf  -  with  pdf  A(x' ,  t')  -  waited  at  position  x'  until  the  time  t  was 
reached  and  then  jumped  within  an  infinitesimal  time  interval  from  position  xr  to  v. 
If  we  regard  a  space  and  time  homogeneous  process  then  \//(x ,  t\x! ,  t')  is  replaced 
by  \//(x  —  x',  t  —  t').  This  allows  the  introduction  of  a  jump  length  pdf  p(x)  and  of  a 
waiting  time  pdf  q(t).  They  are  related  to  the  jump  pdf  by 


d t'\//(x,  t') 


and  q(t) 


/oo 

-oo 


d x'}/f(x',  t )  . 


(17.64) 
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If  the  jump  length  pdf  and  the  waiting  time  pdf  are  conditionally  independent  one 
can  simply  write  t)  —  p(x)q(t).  The  probability  cp(x,  t)  of  finding  a  particle  at 
position  x  at  time  t  is,  furthermore,  given  by 

cp(x,t)  —  (  dt' A(x,  —  t')  ,  (17.65) 

Jo 

where  ^  ( t )  is  the  probability,  that  a  particle  stayed  at  least  for  a  time  interval  t  at  the 
same  position,  i.e. 


m  =  i  - 


(17.66) 


Finally,  the  jump  length  variance  o2  and  the  characteristic  waiting  time  r  are 
given  by 


o‘ 


/oo  /»oo 

d xx'2p(x')  and  r  =  /  d t't'q(t') 

-oo  Jo 


(17.67) 


We  conclude  from  our  discussion  of  the  Wiener  process  that  for  Brownian 
motion  the  jump  length  pdf  is  a  Gaussian  and  the  waiting  time  pdf  is  an  exponential 
distribution: 


p(x)  = 


x 


V2 


exp 


no ‘ 


2  o2 


^  and  q(t)  —  —  exp  ^ ^ 


(17.68) 


The  characteristic  function  [Appendix  Eq.  (E.54)]  of  the  waiting  time  pdf  q(t)  is 
given  by 


-/ 


OO  | 

q(s)  =  I  dt  q{t)e~st  =  — -  , 

'o  st  +  1 


(17.69) 


and  we  find  for  the  jump  length  pdf  p(x): 


p(k) 


=  J  dxe  lkxp(x)  =  exp(— o2k2/2)  . 


(17.70) 


For  v,  t  ->  oo,  i.e.  k,  s  ->  0,  the  characteristic  functions  q{s )  and  p(k)  develop  the 
asymptotic  behavior 


lim -  %  1  —  sr  +  &(s2)  ,  (17.71) 

1  +  5T 
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and 


lim  exp  (~o2k2/ 2)  as  1  -  cr2k2/ 2  +  0(k4)  .  (17.72) 

h  ^0 

In  fact,  it  can  be  shown  that  any  pair  of  jump  length  and  waiting  time  pdfs  lead  in 
first  order  to  the  same  asymptotic  behavior,  namely  &{x)  and  <^(a2),  as  long  as  the 
moments  r  and  a2  exist. 

However,  there  is  a  variety  of  processes  which  cannot  be  accounted  for  within 
the  basic  framework  of  Brownian  motion.  Such  processes  are  described  within  the 
concept  of  anomalous  diffusion  [9,  20].  Examples  are,  for  instance,  the  foraging 
behavior  of  spider  monkeys,  particle  trajectories  in  a  rotating  flow,  diffusion  of 
proteins  across  cell  membranes,  diffusion  of  tracers  in  polymer-like  breakable 
micelles,  the  traveling  behavior  of  humans,  charge  carrier  transport  in  disordered 
organic  molecules,  etc. 

We  concentrate  now  on  two  particular  models  of  anomalous  diffusion.  The  first 
model  can,  from  a  qualitative  point  of  view,  be  characterized  as  a  diffusion  process 
which  consists  of  small  clustering  jumps  which  are  intersected  by  very  long  flights. 
Such  behavior  is,  for  instance,  encountered  in  the  context  of  human  travel  behavior, 
Fig.  17.6  [22],  charge  carrier  transport  in  disordered  solids,  etc.  The  incorporation 
of  these  long  jumps  on  a  stochastic  level  is  referred  to  as  Levy  flight  [10].  The 
second  model,  which  is  referred  to  as  the  fractal  time  random  walk  incorporates 


Fig.  17.6  Traveling  behavior  of  humans  (Adapted  from  [22] .  Copyright  ©  2006,  Rights  Managed 
by  Nature  Publishing  Group) 
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anomalously  long  waiting  times  between  two  successive  jumps.  In  particular,  these 
long  waiting  times  account  for  non-Markovian  effects  which  could  be  due  to,  for 
instance,  trapping  processes  of  charge  carriers  in  disordered  solids.  It  has  to  be 
emphasized  at  this  point  that  the  resulting  diffusion  models  are  still  linear  in  the 
pdf  cp(x,  t).  The  inclusion  of  non-linear  effects  will  not  be  discussed  here,  however, 
can  be  achieved  within  the  framework  of  non-extensive  thermodynamics  [23]. 

Let  us  start  with  Levy  flights.  In  this  case  one  modifies  the  asymptotic  behavior 
of  the  characteristic  function  of  the  jump  length  pdf  according  to 

p(k)  oc  1  -{a\k\f  ,  (17.73) 

where  a  e  (0,  2].  We  recognize  that  this  is  the  asymptotic  behavior  \k\  ->  0  of  the 
characteristic  function  of  a  symmetric  Levy  af-stable  distribution  [19]  following 
Appendix  Eq.  (E.69).  In  the  limit  a  ->  2  normal,  Gaussian  behavior  is  recovered. 
According  to  Appendix  Eq.  (E.70)  the  characteristic  function  (17.73)  corresponds 
to  a  jump  length  pdf: 


p(x)  oc 


— a— 1 


for 


oo 


(17.74) 


It  is  commonly  referred  to  as  a fat-tailed  jump  length  pdf  because  of  its  asymptotic 
behavior. 

A  Levy  flight  is,  in  principle,  a  random  walk  where  the  length  of  the  jumps 
at  discrete  time  instances  tn  follow  the  pdf  (17.74).  In  the  continuous  time  limit, 
the  waiting  times  are  distributed  exponentially  as  was  illustrated  in  Sect.  16.5.  It 
has  to  be  noted  that  in  such  a  case  the  jump  length  variance  diverges,  i.e.  U2  -> 
oo.  Consequently,  Levy  Gf-stable  distributions  are  not  subject  to  the  central  limit 
theorem  (see  Appendix,  Sect.  E. 8).  In  particular,  the  distance  from  the  origin  after 
some  finite  time  t  follows  a  Levy  Gf-stable  distribution.  Moreover,  we  note  that 
if  0  <  ot  <  1  even  the  mean  jump  length  (x)  diverges.  A  detailed  mathematical 
analysis  proves,  that  Levy  flights  result  in  a  diffusion  equation  of  the  form 

%-p(x,t)=DaW,p(x,i)  ,  (17.75) 

at  11 

where  Da  is  the  fractional  diffusion  coefficient  of  dimension  length“x  time-1  and 
is  the  symmetric  Riesz  fractional  derivative  operator  of  order  a  e  (1, 2)6: 


®S,/» 


_ ! 

2F(2  —  o')  cos  J  \x  —  x'\a  1 


(17.76) 


where f"(x)  is  the  second  spatial  derivative  off. 


6  A  short  introduction  to  fractional  derivatives  and  integrals  can  be  found  in  Appendix  G. 
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Fig.  17.7  Three  different 
realizations  of  the  one 
dimensional  Levy  flight.  The 
parameters  are  i  =  0.001, 
a  =  1.3  and  we  performed 
N  =  1000  time  steps 


Fig.  17.8  Comparison 
between  the  two-dimensional 
Wiener  process  ( solid 
up-triangles )  and  the 
two-dimensional  Levy  flight 
{open  squares )  for  a  =  1.3. 
The  minimal  flight  length  of 
the  Levy  flight  as  well  as  the 
jump  length  variance  of  the 
Wiener  process  were  set 
l  =  E2  =  0.1  and  we 
performed  N  =  100  time 
steps 


_ i _ i _ i _ i _ i _ i _ i _ i _ i _ i _ i _ 

-1  0  1  2  3  4  5 
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Figure  17.7  illustrates  a  one-dimensional  Levy  flight  and  Fig.  17.8  presents 
a  comparison  between  a  two-dimensional  Levy  flight  and  a  two-dimensional 
Wiener  process.  These  figures  were  generated  by  sampling  an  exponential  dis¬ 
tribution  with  mean  (t)  —  1  for  the  waiting  times.  On  the  other  hand  a  jump  length 
pdf 


p(x)  =  c*r0(*  £)  ,  x  >  0  .  (17.77) 

was  sampled  for  the  jump  length  of  the  Levy  flight.  Here  a  is  referred  to  as  the 
Levy  index,  &(•)  denotes  the  Heaviside  &  function  and  t  >  0  is  the  minimal 
flight  length.  We  introduced  this  particular  form  of  the  pdf  because  it  can  rather 
easily  be  sampled  with  the  help  of  the  inverse  transformation  method  -  Sect.  13.2  — 
and  it  obeys  the  asymptotic  behavior,  Eq.  (17.74).  Moreover,  it  can  be  proved  that  it 
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gives  the  correct  behavior  in  the  limit  i  ->  0.  Finally,  the  direction  of  the  jump  has 
to  be  sampled  in  an  additional  step.  Figure  17.8  is  particularly  instructive  because 
the  different  physics  described  by  these  two  models  becomes  immediately  apparent. 

Let  us  turn  our  attention  to  the  second  scenario,  the  fractal  time  random  walk.  In 
this  case  the  asymptotic  behavior  of  the  waiting  time  pdf  is  modified  according  to 

q(s)  oc  1  -  (Ts)p  ,  (17.78) 

where  ft  e  (0, 1]  and  for  ft  ->  1  regular  behavior,  an  exponentially  distributed 
waiting  time,  is  recovered.  A  pdf  of  such  a  form  is  commonly  referred  to  as  a.  fat¬ 
tailed  waiting  time  pdf.  After  an  inverse  LAPLACE  transform  we  obtain 

q(t )  oc  t~P~l  for  t  ->  oo  .  (17.79) 

We  note  that  in  this  case  the  mean  waiting  time  T  =  (t)  diverges  for  ft  <  1. 
This  clearly  indicates  a  non-Markovian  time  evolution  since  we  demonstrated  in 
Sect.  16.5  that  every  Markovian  discrete  time  process  converges  in  the  continuous 
time  limit  to  a  process  with  exponentially  distributed  waiting  times.  Again,  the 
ansatz 


a  @(t  —  r) 

q(t)  -  I3zp  ^+1  ,  (17.80) 

is  employed,  where  r  >  0  is  the  minimal  waiting  time.  The  process  is  essentially  a 
random  walk  with  waiting  times  distributed  according  to  Eq.  (17.80),  i.e.  the  jump 
length  Ax  is  constant.  In  the  continuous  space  limit  Ax  ->  0  the  jump  lengths  follow 
a  Gaussian,  as  in  the  case  of  a  regular  random  walk.  A  detailed  analysis  proves  that 
in  the  limit  r  ->  0  the  corresponding  diffusion  equation  is  given  by 

d2 

CDtP(x,  t)  =  Dp—p(x,t)  ,  (17.81) 

where  the  diffusion  constant  Dp  is  of  dimension  length2  x  time-^.  Here,  cDf  is  the 
Caputo  fractional  time  derivative  of  order  ft  e  (0,1)  (see  Appendix  G).  It  is  of  the 
form 


cDPf(t)  =  -  1  [  dr  -  —  \ .  .  (17.82) 

J  r(i  -  p)  J0  (t-t'Y 

It  follows  from  the  properties  of  fractional  derivatives  that  an  alternative  form  of 
Eq.  (17.81)  can  be  found,  namely 


9  92 

jtP(x,t)  =  Dfi—DPp(x,t)  , 


(17.83) 
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Fig.  17.9  Three  different 
realizations  of  the  fractal  time 
random  walk  in  one 
dimension  for  ft  =  0.8  and 
r  =  0.1 


t 


where  Z)f  is  the  Riemann-Liouville  fractional  derivative  of  order  /3  (see 
Appendix  G). 

Figure  17.9  presents  three  different  realizations  of  the  fractal  time  random  walk. 
The  waiting  times  were  sampled  from  the  pdf  (17.80)  with  the  help  of  the  inverse 
transformation  method  -  Sect.  13.2  -  and  the  jump  lengths  were  sampled  from  a 
normal  distribution  with  jump  length  variance  U2  =  1. 

It  is  a  straight-forward  task  to  combine  fractal  time  random  walks  and  Levy 
flights  to  so  called  fractal  time  Levy  flights.  The  resulting  diffusion  equation  can 
be  written  as 


CD?p(x,  t)  =  Dals@fcp(x,  t)  ,  (17.84) 

where  the  diffusion  constant  D(lp  has  units  length®  x  time-^  and  c  7)f  and  are 
the  fractional  Caputo  and  Riesz  derivatives,  respectively. 

Figure  17.10  illustrates  three  different  realizations  of  such  a  diffusion  process. 
The  waiting  times  were  sampled  from  the  pdf  (17.80)  where  we  set  r  =  0.1  and 
=  0.8.  The  jump  lengths  were  sampled  from  the  pdf  (17.77)  with  a  —  1.3  and 
l  —  0.01.  Finally,  the  direction  of  the  jump  was  sampled  in  an  additional  step. 

We  close  this  chapter  with  a  short  discussion:  The  description  of  diffusion 
processes  with  the  help  of  stochastics  proofed  to  be  one  of  the  most  powerful 
methods  in  modern  theoretical  physics.  Within  this  chapter  we  discussed  several 
different  paths  toward  a  description  of  Brownian  motion,  namely  the  random 
walk,  the  Wiener  process,  and  the  L Angevin  equation,  as  well  as  models  which 
describe  phenomena  beyond  Brownian  motion.  It  has  to  be  emphasized  that  the  field 
of  anomalous  diffusion  in  general  is  still  developing  rapidly,  however,  its  importance 
for  the  description  of  various  phenomena  in  science  is  already  impressive.  We 
refer  the  interested  reader  to  the  excellent  review  articles  by  R.  Metzler  and  J. 
Kl AFTER  on  anomalous  diffusion  [20,  21]. 
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Fig.  17.10  Three  possible 
realizations  of  the  fractal  time 
Levy  flight  in  one 
dimension.  The  parameters 
are  r  =  0.1,  /3  =  0.8, 
i  =  0.01  and  a  =  1.3 


t 


Summary 

The  random  walk,  a  classical  example  of  MARKOV-chains,  was  used  to  open  the 
door  to  the  realm  of  diffusion  theory.  Random  walks  have  been  used  for  a  long 
time  to  simulate  Brownian  motion  and  related  problems.  From  a  theoretical  point  of 
view  random  walks  were  described  by  the  scaling  limit  of  the  Wiener  process.  The 
biased  Wiener  process  was  then  used  to  demonstrate  that  the  Fokker-Planck 
equation  followed  in  the  limit  of  a  continuous  state  space,  as  the  classical  diffusion 
equation  followed  from  the  unbiased  Wiener  process  in  the  same  limit.  Brownian 
motion  was  also  the  basis  for  the  rather  heuristic  introduction  of  the  stochastic 
differential  equation  by  L ANGEVIN.  A  direct  consequence  of  this  equation  was  the 
Ornstein-Uhlenbeck  process  with  its  master  equation,  the  Fokker-Planck 
equation.  It  was  the  only  stationary,  Gaussian,  and  Markovian  process  in  this 
class  of  stochastic  diffusion  processes.  An  extension  of  these  processes  was  then 
possible  by  the  introduction  of  a  jump  pdf  which  in  turn  allowed  to  define  a  jump 
length  pdf  and  a  waiting  time  pdf.  These  two  pdfs  resulted  in  a  more  general 
description  of  diffusion  processes  in  a  space  and  time  homogeneous  environment. 
Furthermore,  the  observation  that  many  diffusive  processes  (not  only  in  physics) 
cannot  be  understood  within  the  framework  of  ‘classical’  Brownian  motion  resulted 
in  the  introduction  of  Levy  flights.  This  was  particularly  motivated  by  the  need 
for  a  process  whose  jump-length  variance  diverges  which  enabled,  for  instance  the 
simulation  of  human  travel  behavior.  In  the  very  last  step  the  fractal  time  random 
walk  was  introduced.  It  was  characterized  by  a  specific  form  of  the  waiting  time  pdf 
which  made  it  possible  to  describe  on  a  stochastic  level  anomalously  long  waiting 
times  between  two  consecutive  jumps.  Such  behavior  can,  for  instance,  be  observed 
by  trapping  phenomena  in  solids.  The  combination  of  both  extensions  resulted  in 
the  fractal  time  Levy  flight. 
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Problems 

1 .  Write  a  program  which  simulates  different  realizations  of  the  following  stochas¬ 
tic  processes  in  one  spatial  dimension: 

a.  A  random  walk. 

b.  A  standard  Wiener  process  and  a  Wiener  process  with  drift. 

c.  An  Ohrnstein-Uhlenbeck  process. 

d.  A  Levy  flight. 

e.  A  fractal  time  random  walk. 

f.  A  fractal  time  Levy  flight. 

Illustrate  three  different  sample  paths  graphically  for  each  process.  Furthermore, 
perform  the  following  tests: 

a.  Calculate  the  expectation  value  (. xn }  and  the  variance  var(vn)  of  the  random 
walk  numerically  by  restarting  the  process  several  times  with  different  seeds. 

b.  In  a  similar  fashion,  calculate  numerically  (Wt)  and  var  ( Wt ). 

c.  Try  different  parameters  a,  /3  for  Levy  flights  and  fractal  time  random  walks. 

2.  Write  a  program  which  simulates  the  Wiener  process  in  two  dimensions.  This 
can  be  achieved  by  drawing  the  jump  length  from  a  normal  distribution  and 
sampling  the  jump  angle ,  i.e.  the  direction,  in  an  additional  step.  Augment  this 
program  with  Levy  flight  jump  lengths  pdfs. 
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Chapter  18 

MARKOV-Chain  Monte  Carlo  and  the  Potts 
Model 


18.1  Introduction 

This  chapter  discusses  in  more  detail  the  concept  of  MARKOV-chain  Monte  Carlo 
techniques  [1-4].  We  already  came  across  the  Metropolis  algorithm  in  Sect.  14.3, 
where  the  condition  of  detailed  balance  proved  to  be  the  crucial  point  of  the  method. 
The  reason  for  imposing  such  a  condition  was  explained  in  all  required  detail  during 
our  discussion  of  MARKOV-chains  within  Sect.  16.4.  The  ISING  model,  analyzed  in 
Chap.  15,  served  as  a  first  illustration  of  the  applicability  of  MARKOV-chain  Monte 
Carlo  methods  in  physics. 

Let  us  briefly  summarize  what  we  learned  so  far:  We  discussed  several  methods 
to  sample  pseudo  random  numbers  from  a  given  distribution  in  Chap.  13.  The 
two  most  important  methods,  the  inverse  transformation  method  and  the  rejection 
method,  were  based  on  an  exact  knowledge  of  the  analytic  form  of  the  distribution 
function  which  the  random  numbers  were  supposed  to  follow.  However,  when 
simulating  the  physics  of  the  ISING  model  it  was  required  to  draw  random 
configurations  from  the  equilibrium  distribution  of  the  system  and,  unfortunately, 
the  exact  analytic  form  of  this  distribution  was  unknown.  On  the  other  hand,  in  the 
discussion  of  MARKOV-chains  we  came  across  the  condition  of  detailed  balance. 
Invoking  this  condition  ensured  that  the  constructed  MARKOV-chain  converged 
toward  a  stationary  distribution,  independent  of  the  initial  conditions.  Consequently, 
we  can  also  sample  random  numbers  by  constructing  a  MARKOV-chain  with  a 
stationary  distribution  which  is  equal  to  the  distribution  from  which  we  would  like  to 
obtain  our  random  numbers.  In  such  a  case  the  distribution  function  has  to  be  known, 
at  least  in  principle.  However,  the  formulation  of  the  METROPOLIS  algorithm 
allowed  for  an  unknown  normalization  constant  of  the  distribution  function  which, 
in  turn,  makes  this  method  such  a  powerful  tool  in  computational  physics. 

Here  we  plan  to  discuss  MARKOV-chain  Monte  Carlo  techniques  in  greater 
detail.  We  start  with  the  introduction  of  the  concept  of  importance  sampling ,  review 
the  Metropolis  algorithm,  and  discuss  the  straight  forward  generalization  to  the 


©  Springer  International  Publishing  Switzerland  2016 

B.A.  Stickler,  E.  Schachinger,  Basic  Concepts  in  Computational  Physics , 

DOI  10.1007/978-3-319-27265-8  18 


297 


298 


18  MARKOV-Chain  Monte  Carlo  and  the  Potts  Model 


Metropolis-Hastings  algorithm.  Finally,  the  applicability  of  the  Metropolis- 
Hastings  algorithm  will  be  demonstrated  by  simulating  the  physics  of  the  ^-states 
Potts  model  [5]  which  is  closely  related  to  the  Ising  model.  This  chapter  is 
closed  with  a  brief  presentation  of  some  of  the  more  advanced  techniques  within 
this  context. 


18.2  MARKOV-Chain  Monte  Carlo  Methods 

Before  turning  our  focus  toward  the  MARKOV-chain  Monte  Carlo  methods  we  shall 
briefly  discuss  importance  sampling.  Let  p(x)  be  a  certain  pdf  from  which  we  would 
like  to  draw  a  sequence  of  random  numbers  {jq-},  i  >  1.  Furthermore,  let  f(x)  be 
some  arbitrary  function  and  we  would  like  to  estimate  its  expectation  value  (/) 
which  is  determined  by  the  integral 


dxf(x)p(x)  . 


(18.1) 


But  (/)  can  also  be  regarded  as  the  expectation  value  (a)u  of  the  function  a(x )  := 
f(x)p(x ),  with  u(x)  the  pdf  of  the  uniform  distribution.  Hence,  we  may  evaluate 
(a)u  by  drawing  uniformly  distributed  random  numbers  on  a  given  interval  [a,  b]  C 
R  and  by  estimating  the  expectation  value  by  its  arithmetic  mean  as  discussed  in 
Sect.  14.2.  This  approach  is  the  easiest  version  of  a  method  referred  to  as  simple 
sampling.  On  the  other  hand,  we  might  approximate  (/)  by  sampling  X[  according 
to  p(x)  and  by  employing  the  central  limit  theorem  (see  Appendix,  Sect.  E. 8)  as 
was  demonstrated  in  Sect.  14.2.  The  basic  idea  of  importance  sampling ,  however,  is 
to  improve  this  approach  by  sampling  from  a  different  distribution  q(x)  which  is  in 
most  cases  chosen  in  such  a  way  that  the  expectation  value  (/)  is  easier  to  evaluate. 

Let  g(x)  be  some  function  with  g(x)  >  0  for  all  x.  Then 


-f 


=f 


f(x) 


'f 


(/)„  =  /  dxf(x)p(x)  =  I  dx—p(x)g(x)  =  c  ^ - 


(18.2) 


where  we  defined  the  function  q(x)  =  p(x)g(x)/c  and  c  is  chosen  in  such  a  way 
that  f  d xq(x)  —  1.  We  note  that  g(x)  can  be  any  positive  function.  Hence,  such  an 
approach  might  be  interesting  in  two  different  scenarios:  (i)  if  it  is  easier  to  sample 
from  the  distribution  q(x)  rather  than  from  p(x)  and,  (ii)  if  such  a  sampling  results 
in  a  variance  reduction  which  is  equivalent  to  a  decrease  in  error,  and  less  random 
numbers  are  to  be  sampled  to  obtain  comparable  results. 

Let  us  briefly  elaborate  on  this  point:  we  have 


a 


var 


5 


(18.3) 
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and  for  the  particular  choice  g(v)  =  f(x)  we  obtain  that 

var(-)  =  0  ,  (18.4) 

\§Jq 

and  the  error  of  our  Monte  Carlo  integration  vanishes.  However,  the  ideal  case  of 
g(v)  =  f(x)  is  unrealistic  because  we  obtain  for  the  normalization  constant  c 

c=  f  d xp(x)g(x)  =  (, g)p  =  (f)p  ,  (18.5) 

which  is  exactly  the  integral  we  want  to  evaluate.  Nevertheless,  the  function  g(x) 
can  be  adapted  to  improve  the  result.  We  choose  g(x)  in  such  a  way  that  the  integral 
(g)  is  easily  evaluated  and  that  g(x)  follows /(v)  as  closely  as  possible;  in  other 
words,  the  quotient  f(x)/g(x)  becomes  as  constant  as  possible.  This  means  that  we 
no  longer  sample  from  p(x)  within  a  given  interval  but  only  from  points  which  are 
of  importance  for  the  particular  function /(v).  Such  an  approach  is  referred  to  as 
importance  sampling  [6-9] . 

The  attentive  reader  might  have  observed  that,  on  a  first  glance,  importance 
sampling  has  nothing  to  do  with  MARKOV-chain  Monte  Carlo  methods  in  general. 
Nevertheless,  it  can  be  demonstrated  that  MARKOV-chain  Monte  Carlo  methods 
correspond  indeed  to  importance  sampling. 

To  prove  this,  we  remember  that  MARKOV-chain  Monte  Carlo  techniques  are 
based  on  the  generation  of  a  sequence  of  configurations  S^n): 

S(D  5(2)  . . .  _>  56*)  (18.6) 

Each  individual  configuration  is  generated  from  the  previous  configuration 
S(n-i)  at  random  with  a  certain  transition  probability  P(S^n~l)  ->  S^O).  These 
transition  probabilities  obey 

P(S  -+S')>  0  and  -►  S’)  =  1  ,  (18.7) 

S' 

and  this  property  ensures  that  the  sequence  (18.6)  is  a  MARKOV-chain.  In  Sect.  16.4 
we  observed  that  the  condition  of  detailed  balance  for  a  stationary  distribution  P(S) 

P(S)P(S  ->  S')  =  P(S')P(S'  ->  S )  (18.8) 

guarantees  convergence  of  the  MARKOV-chain  toward  the  stationary  distribution. 
Hence,  the  remaining  task  is  to  find  transition  probabilities  which  fulfill  detailed 
balance.  In  a  typical  situation,  the  transition  probabilities  can  be  written  as 

P(S  ->  S')  =  Pp(S  ->  S')Pa(S  ->  S')  , 


(18.9) 
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where  Pp(S  ->  S')  is  the  probability  that  a  configuration  S'  is  proposed  and  Pa(S  -> 
S')  is  the  probability  that  the  proposed  configuration  is  accepted.  In  many  cases  one 
simplifies  the  situation  by  assuming  that 

Pp(S  ^  S')  =  Pp(S'  ^  S)  ,  (18.10) 

and,  thus,  the  condition  of  detailed  balance  changes  into 

P(S)Pa(S  ->  S')  =  P(S')Pa(S'  ->  S)  .  (18.11) 

The  Metropolis  algorithm  uses  one  possible  choice  for  the  acceptance  probabil¬ 
ity,  namely 


Pa(S 


S')  —  min 


1, 


Pis') 
P(S)  J 


(18.12) 


It  was  demonstrated  in  Sect.  14.3  that  Eq.  (18.12)  indeed  fulfills  Eq.  (18.11).  The 
execution  of  the  algorithm  has  already  been  illustrated  in  Chap.  15  in  its  application 
to  the  numerics  of  the  ISING  model. 

A  rather  straight  forward  generalization  of  the  METROPOLIS  algorithm  (18.12) 
is  found  when  an  asymmetric  proposal  probability  Pp  (, S  ->  S')  is  considered.  It  is 
easily  demonstrated  that  the  choice 


Pa(S  ->  S')  =  min 


1, 


P(S')  Pp(S'  ->  S) 
P(S)  Pp(S^  S')  _ 


(18.13) 


also  fulfills  detailed  balance  (18.8).  The  choice  (18.13)  is  referred  to  as  the 
Metropolis-Hastings  algorithm  [10]. 1 

By  exploiting  the  MARKOV  property  in  order  to  sample  configurations  according 
to  the  Boltzmann  distribution  we  perform  importance  sampling  as  illustrated 
above.  An  alternative  approach  would  be  to  select  different  configurations  according 
to  a  uniform  distribution  which  obviously  increases  the  numerical  cost  of  the  method 
by  magnitudes.  Hence,  sampling  with  the  help  of  a  MARKOV-chain  yields  a  variance 
reduction  in  comparison  to  the  crude  approach  of  simple  sampling.  Furthermore,  the 
algorithm  can  be  optimized  by  a  clever  choice  of  Pp(S  ->  S')  which  does  not  need 
to  be  symmetric.  Clearly,  this  choice  will  have  to  depend  on  the  particular  problem 
at  hand. 

We  shall  briefly  discuss  two  alternative  approaches  to  MARKOV-chain  Monte 
Carlo  sampling,  namely  Gibbs  sampling  [11]  and  slice  sampling  [12]:  Suppose  we 
want  to  sample  a  sequence  of  m- dimensional  variables  x^n)  —  (x[n^ ,  , . . . ,  x^)T 

from  a  multivariate  distribution  function  p(x)  —  p(x\,X2, . . . , xm).  In  such  a  case 


!Please  note  that  it  is  common  in  the  literature  to  refer  even  to  Eq.  (18.12)  as  a  Metropolis- 
Has tings  algorithm,  despite  the  fact  that  here  Pp(S'  — ^  S)  =  Pp(S  — ^  S'). 


18.2  MARKOV-Chain  Monte  Carlo  Methods 


301 


Gibbs  sampling  is  particularly  interesting  if  the  joint  distribution  function  is  well- 
known  and  simple  to  sample.  The  acceptance  probability  for  a  particular  component 
of  the  vector  x^n)  is  set  to: 


O+i) 


o+l)  Jri) 


xj- 1 


*/•+ 1 


(18.14) 


This  is  possible  as  we  have 


P(xj\xi >  •  •  •  >  4/ — i  > Xj+i j  •  •  •  j  Xjfi) 


p(x  1 1  i  xm) 


P^X  1  ?  •  •  •  5  Xj — \ ,  ,  .  .  .  ,  Xjfi) 

oc  p(vi, . . . ,  xm)  ,  (18.15) 


because  the  denominator  of  the  left  hand  side  of  Eq.  (18.15)  is  independent  of  xj.  It 
can,  therefore,  be  treated  as  a  normalization  constant  when  xj  is  sampled. 

Let  us  briefly  discuss  slice  sampling :  For  reasons  of  simplicity  we  shall  regard 
the  uni- variate  case  where  p(x )  denotes  the  pdf  from  which  we  would  like  to  sample. 
We  apply  the  following  algorithm: 

1.  Choose  some  initial  value  xq. 

2.  Sample  a  uniformly  distributed  random  variable  yo  from  the  interval  [0, p(xo)\. 

3.  Sample  the  next  random  variable  x\  uniformly  from  the  slice  y\p~l  (yo)]- 

4.  Sample  a  uniformly  distributed  random  number  y\  e  [O.p(A'l)]. 

5.  Sample  the  next  random  variable  x 2  uniformly  from  the  slice  ^\p~l  (yi)]- 

6.  . . . 


The  final  sequence  {xj}  is  constructed  by  ignoring  the  y„-values  (even  steps).  This 
procedure  is  illustrated  schematically  in  Fig.  18.1. 


Fig.  18.1  Schematic 
illustration  of  slice  sampling 
for  the  uni-variate  case 
described  by  the  pdf  p(x). 
The  relevant  steps  of  the 
algorithm  are  indicated  by 
solid  asterisks 
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18.3  The  Potts  Model 

We  studied  in  Chap.  15  the  two-dimensional  Ising  model  as  an  example  for  the 
Metropolis  algorithm.  Here  we  expand  on  this  discussion  and  investigate  the  te¬ 
states  Potts  model  [5]  which  can  be  regarded  as  a  generalization  of  the  Ising 
model.  The  model  is  characterized  by  the  Hamilton  function 

H  =  -J2JiAiaj,  (18.16) 

ij 


where  the  notation  used  for  the  Ising  model  applies.  In  particular,  we  shall  regard 
the  case  Jy  =  J  for  ij  nearest  neighbors  and  Jy  =  0  otherwise.  In  contrast  to 
the  ISING  model,  the  spin  realizations  07  on  grid-point  i  can  take  integer  values 
07  =  1,2,...,#.  For  q  —  2,  the  POTTS  model  is  equivalent  to  the  ISING  model 
which  can  be  easily  proved  by  rewriting  the  Hamilton  function  as 


(18.17) 


Here  (ij)  denotes  sum  over  nearest  neighbors.  We  observe  that  2  (2  —  8ai(Jj)  is  equal 
to  —1  for  07  =  Gj  and  +1  for  07  ^  Oj.  Moreover,  the  constant  energy  shift  in 
Eq.  (18.17)  can  be  neglected  and  we  recover  the  Ising  model  of  Chap.  15. 

The  method  to  calculate  the  observables  of  interest,  like  (E),  (M),  Ch  and  /,  and 
the  basic  algorithm  can  be  adopted  from  Sect.  15.2  as  is.  There  is  one  important 
difference.  It  occurs  in  step  3  of  the  algorithm:  Instead  of  setting  Gy  —  —Gy  we 
sample  the  new  value  of  Gy  uniformly  distributed  from  1,2,...,#  under  exclusion 
of  the  old  value  of  Gy. 

Figures  18.2, 18.3, 18.4,  and  18.5  display  the  mean  energy  per  particle,  the  mean 
magnetization  per  particle  ( m\ }  [with  Q—  1  in  Eq.  (18.18)],  the  heat  capacity  Ch  as 
well  as  the  magnetic  susceptibility  /  for  q  —  1, 2, . . . ,  8  and  /  =  0.5  [Eq.  (18.16)] 
vs  temperature  k^T.  The  size  of  the  system  was  N  x  N  with  N  =  40.  We  performed 
104  measurements  per  temperature  and  10  sweeps  where  discarded  between  two 
successive  measurements  in  order  to  reduce  correlations.  The  equilibration  time  was 
set  to  102 3  sweeps.  A  typical  spin  configuration  for  q  =  4  and  k^T  —  0.47  can  be 
found  in  Fig.  18.6. 

A  number  of  interesting  details  can  be  observed  in  Fig.  18.3.  First  of  all,  we 
recognize  that  the  mean  magnetization  (m\ }  above  the  critical  temperature  decreases 


2  We  calculate  the  magnetization  in  a  particular  spin  Q  via 


(18.18) 
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Fig.  18.2  The  mean  free  energy  per  particle  ( s )  vs  temperature  k^T  for  a  ^-states  Potts  model  on 
a  40  x  40  square  lattice,  with  q  =  1 , 2, . . . ,  8  and  J  =  0.5.  104  measurements  have  been  performed 


kBT 

Fig.  18.3  The  mean  magnetization  per  particle  (mi)  vs  temperature  kBT  for  a  ^-states  Potts 
model  on  a  40  x  40  square  lattice,  for  q  =  1,  2, . . . ,  8  and  J  =  0.5.  104  measurements  have  been 
performed 


with  increasing  values  of  q.  The  reason  is  that  the  mean  magnetization  (mi) 
represents  for  T  Tc  the  probability  of  finding  a  particular  spin  in  state  a/  =  1 . 
This  is  equivalent  to  1 1 q  for  a  uniform  distribution  and  therefore  decreases  with 
increasing  q.  Please  note  that  the  expectation  value  of  the  magnetization  (thq)  is 
restricted  to  take  the  values  from  {0, 1}  for  T  <$C  Tc  due  to  the  modified  definition 
of  <JTq(^o).  This  is  in  contrast  to  the  ISING  model  where  (m)  E  {—1,1}  for  T  <  Tc. 
Hence,  we  obtain  for  T  Tc  (mi)  =  0  with  probability  (q  —  1  )/q  and  (mi)  =  1 
with  probability  1  / q.  However,  the  particular  definition  (18.18)  of  ^q(^o)  is  not 
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Fig.  18.4  The  heat  capacity  q,  vs  temperature  k^T  for  a  ^-states  Potts  model  on  a  40  x  40  square 
lattice,  with  q  =  1,  2, . . . ,  8  and  J  =  0.5.  104  measurements  have  been  performed.  The  inset  shows 
the  specific  heat  q,  on  a  logarithmic  scale  in  the  region  around  the  transition  temperature 
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Fig.  18.5  The  magnetic  susceptibility  /  vs  temperature  kBT  for  a  ^-states  Potts  model  on  a 
40  x  40  square  grid,  with  q  =  1, 2, . . . ,  8  and  J  =  0.5.  104  measurements  have  been  performed. 
The  inset  shows  the  magnetic  susceptibility  /  on  a  logarithmic  scale  in  the  region  around  the 
transition  temperature 


important  since  the  physically  relevant  property  of  the  POTTS  model  HAMILTON 
function  (18.16)  is  its  Zq  symmetry  with  a  degenerate  ground  state. 

A  second  interesting  feature  is  the  observation  that  the  critical  temperature  Tc 
also  decreases  with  increasing  values  of  q  which  becomes  particularly  transparent 
from  Figs.  18.4  and  18.5.  The  critical  temperatures  are  quoted  in  Table  18.1.  Finally, 
we  deduce  from  Fig.  18.2  that  the  phase  transition  is  smoother  for  q  —  2  and 
becomes  discontinuous  for  large  values  of  q.  In  particular,  the  ^-states  POTTS  model 
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Fig.  18.6  A  typical  spin 
configuration  cr/j  for  a 
^-states  Potts  model  on  a 
40  x  40  square  lattice  with 
q  =  4,  J  =  0.5,  and 
kBT  =  0.47 


Table  18.1  List  of  the 
critical  values  ac  =  J  /  (kB  Tc) 
of  the  ^-states  Potts  model 
for  q  =  2,  3, . . . ,  8 


10  20  30  40 


j 


q 

Oic 

2 

0.89 

3 

1.00 

4 

1.09 

5 

1.16 

6 

1.22 

7 

1.28 

8 

1.35 

exhibits  a  second  order  phase  transition  for  q  —  2,3,4  and  a  first  order  phase 
transition  for  q  >  4  which  is  hard  to  see  from  Figs.  18.2  and  18.3.  However, 
there  is  another  method  to  unambiguously  identify  a  first  order  phase  transition. 
It  is  referred  to  as  the  histogram  technique.  The  mean  energies  of  consecutive 
measurements  near  the  critical  temperature  are  simply  collected  in  a  histogram. 
If  only  one  peak  is  observed,  the  system  fluctuates  around  a  single  phase,  and  a 
second  order  phase  transition  was  observed.  However,  the  existence  of  two  or  more 
peaks  means  that  the  system  fluctuates  between  two  or  more  different  phases  and, 
therefore,  exhibits  a  first  order  phase  transition.  Figure  18.7  displays  two  histograms 
for  q  —  2  (k^T  =  0.56)  and  q  —  8  (k^T  =  0.37)  from  104  measurements  to  prove 
our  case. 

One  possible  realization  of  the  q  —  3  states  Potts  model  was  discussed  by  M. 
Kardar  and  A.N.  Berker  [13].  They  studied  the  over  saturated  adsorption  of 
Krypton  atoms  on  a  graphite  surface.  A  detailed  analysis  of  this  system  revealed 
that  three  energetically  degenerate  sublattices  are  formed.  Furthermore,  the  authors 
demonstrated  that  the  thermodynamic  properties  of  this  system  can  be  explained  by 
a  q  —  3  states  POTTS  model.  For  a  more  detailed  discussion  we  refer  to  the  original 
paper.  More  applications  of  the  POTTS  model  were  discussed  in  the  review  by  F.Y. 
Wu  [14]. 
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Fig.  18.7  (a)  Histogram  of  104  measurements  of  the  absolute  value  of  the  mean  free  energy  per 
particle  |  ( s )  |  for  a  ^-states  Potts  model  on  a  40  x  40  square  lattice  at  temperature  kBT  =  0.56 
with  q  =  2.  We  observe  one  single  peak  which  indicates  that  the  system  exhibits  a  second  order 
phase  transition,  (b)  The  same  as  (a)  but  for  q  =  8  and  temperature  kBT  =  0.37.  We  observe  two 
well  separated  peaks,  thus  the  system  exhibits  a  first  order  phase  transition 


The  attentive  reader  may  have  noticed  that  our  results  do  not  carry  error-bars. 
We  neglected  error-bars  for  a  clearer  illustration.  A  short  discussion  of  methods 
used  to  calculate  numerical  errors  was  presented  in  Sect.  15.2  for  the  Ising  model 
and  they  can  easily  be  adapted  for  the  Potts  model.  More  advanced  techniques  will 
be  introduced  in  the  next  chapter. 


18.4  Advanced  Algorithms  for  the  POTTS  Model 

We  discuss  briefly  some  advanced  techniques  for  the  Potts  model.  Although  these 
algorithms  are  applicable  for  arbitrary  q  we  restrict  our  discussion  to  the  case  q  —  2, 
the  ISING  model,  for  reasons  of  simplicity.  Let  us  briefly  motivate  the  need  for  more 
advanced  methods:  For  large  values  of  N  we  observe  the  formation  of  spin  domains' 
for  temperatures  T  ^  Tc.  In  such  a  case  the  specific  Metropolis  algorithm  used 
so  far  is  disadvantageous  because  single  spin  flips  will  only  affect  the  boundaries  of 
these  domains  ( critical  slowing  down).  It  is  therefore  necessary  to  perform  many 
sweeps  in  order  to  produce  configurations  which  are  entirely  different.  It  might 


3 


This  are  regions  in  which  all  spins  point  in  the  same  direction,  the  so-called  WEISS  domains  [15]. 
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Fig.  18.8  Schematic  illustration  of  the  identification  of  clusters  according  to  the  Swendsen- 
WANG  algorithm.  1  and  2  denote  two  different  spin  orientations,  bonds  are  denoted  by  solid  lines 
and  all  bonded  spins  form  clusters 


therefore  be  a  better  approach  to  flip  a  whole  spin  cluster  at  once.  Such  algorithms 
are  referred  to  as  cluster  algorithms.  The  main  problem  is  the  identification  of 
clusters  as  well  as  the  assignment  of  a  probability  to  the  flip  of  a  particular  cluster. 

As  a  first  example  we  shall  discuss  the  Swendsen-Wang  algorithm  [16].  The 
algorithm  is  executed  in  the  following  steps: 

1.  Identify  all  links  between  two  neighboring  identical  spins. 

2.  Define  a  bond  between  two  linked  spins  with  probability 

P  =  1  -exp(-2 pj)  ,  (18.19) 

with  P  =  1/ (k^T). 

3.  Identify  all  clusters  which  are  built  from  spins  connected  by  bonds,  see  Fig.  18.8. 

4.  Flip  every  cluster  with  probability  1/2. 

5.  Delete  the  bonds  and  restart  the  iteration  for  the  next  spin  configuration. 

We  note  the  following  properties  of  the  Swendsen-Wang  algorithm: 

•  The  algorithm  is  ergodic  because  every  spin  forms  a  cluster  on  its  own  with  a 
non- vanishing  probability  according  to  Eq.  (18.19). 

•  The  algorithm  fulfills  detailed  balance  for  the  Boltzmann  distribution  and  thus 
reproduces  the  correct  stationary  distribution. 

Since  the  algorithm  breaks  domain  walls  or  flips  whole  clusters,  this  algorithm 
can  be  regarded  to  be  very  efficient  from  a  numerical  point  of  view.  However,  it 
outperforms  the  single  spin  METROPOLIS  algorithm  only  for  temperatures  near  the 
critical  temperature  because  only  then  spin  domains  dominate  the  observables. 

A  simpler  version  of  this  algorithm  consists  of  the  following  four  steps: 

1 .  Randomly  pick  a  lattice  site. 

2.  Find  all  neighbors  with  the  same  spin  and  form  bonds  with  probability  (18.19). 

3.  Move  to  the  boundary  of  the  cluster  and  repeat  step  2,  i.e.  the  cluster  grows. 

4.  If  no  new  bond  is  formed,  flip  the  cluster  with  probability  1. 

In  this  simplified  version  the  identification  of  clusters  is  not  necessary  because  each 
cluster  is  built  dynamically  during  the  simulation.  Such  a  formulation  of  a  cluster 
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algorithm  is  the  WOLFF  algorithm  [17]  which  is  essentially  a  generalization  of  the 
work  of  S  WEND  SEN  and  WANG  [16].  But  why  is  it  allowed  to  accept  every  step,  i.e. 
flip  every  formed  cluster,  without  contradicting  the  condition  of  detailed  balance? 
The  explanation  is  found  in  the  definition  of  the  probability  of  bond-formation, 
Eq.  (18.19).  An  even  more  effective  extension  of  the  WOLFF  algorithm  to  quantum 
systems  is  the  loop  algorithm  [18].  For  a  more  detailed  discussion  of  all  these 
methods  we  refer  the  interested  reader  to  the  literature  [19]  and  to  the  particular 
papers  cited  here. 

Before  proceeding  to  the  next  chapter,  let  us  briefly  mention  that  there  is  also 
an  entirely  different  approach  to  improve  the  METROPOLIS  algorithm  for  quantum 
systems,  the  so  called  worm  algorithms  [18].  However,  a  detailed  discussion  of  such 
algorithms  is  beyond  the  scope  of  this  book. 


Summary 

The  dominant  topic  of  this  chapter  was  importance  sampling,  a  method  to  improve 
Monte  Carlo  methods  by  reducing  the  variance.  In  this  method  some  hard  to  sample 
pdf  was  approximated  as  closely  as  possible  by  another,  easy  to  sample  pdf  and  one 
concentrated  on  intervals  which  particularly  matter  for  an  as  accurate  as  possible 
estimate  of,  for  instance,  an  expectation  value  of  some  property /(v).  In  this  sense 
MARKOV-chain  Monte  Carlo  techniques  corresponded  to  importance  sampling  as 
long  as  detailed  balance  was  obeyed.  In  this  particular  case  the  MARKOV-chain 
was  known  to  approach  the  equilibrium  distribution  which  must  not  necessarily 
be  known  in  detail.  The  METROPOLIS  algorithm  with  its  symmetric  acceptance 
probability  was  one  possible  realization  of  Markov -chains  which  obeyed  detailed 
balance.  Another  method  was  the  Metropolis -Has tings  algorithm  with  its 
asymmetric  acceptance  probability.  It  also  obeyed  detailed  balance  and  improved 
the  variance  over  the  ‘classical’  Metropolis  algorithm.  The  second  part  of  this 
chapter  was  dedicated  to  the  simulation  of  the  g-state  Potts  model,  an  extension  of 
the  Ising  model.  The  Potts  model  had  the  feature  that  it  developed  a  second  order 
phase  transition  for  q  <  4  and  a  first  order  phase  transition  for  q  >  4.  Moreover,  the 
transition  temperature  was  ^-dependent.  The  numerical  simulation  of  the  physics  of 
this  model  proved  to  be  able  to  pick  up  on  all  these  particular  features.  Finally,  some 
advanced  algorithms  developed  for  a  more  precise  handling  of  various  properties  of 
spin-models,  particularly  around  the  phase  transition,  have  been  presented  without 
going  into  great  detail. 
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Problems 

1.  Modify  the  program  designed  to  solve  the  Ising  model  (see  Problems  in 
Chap.  15)  in  such  a  way  that  the  physics  of  the  ^-states  POTTS  model  can 
be  simulated  for  arbitrary  values  of  q.  Try  to  reproduce  the  figures  presented 
within  this  chapter.  In  order  to  investigate  the  order  of  the  phase  transition,  plot 
the  internal  energy  per  particle  (s)  for  T  &  Tc  in  a  histogram  for  different 
measurements. 

The  critical  temperatures  listed  in  Table  18.1  for  g  =  2,  3, . . . ,  8  can  be  used 
to  validate  your  code. 

2.  Include  a  non-zero  external  field  h  and  study  its  influence  on  the  physics  of  the 
^-states  Potts  model  for  different  values  of  q. 
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Chapter  19 

Data  Analysis 


19.1  Introduction 


It  is  the  aim  of  this  chapter  to  present  some  of  the  most  important  techniques  of 


statistical  data  analysis  which  is  of  interest  for  experimental  as  well  as  theoretical 
sciences.  In  particular,  the  superstition  that  numerically  generated  data  sets  do 
not  need  to  be  analyzed  with  statistical  methods  is  certainly  not  justified  if  the 
data  was  generated  by  Monte  Carlo  methods.  Some  simple  methods  of  statistical 
analysis  have  already  been  discussed  in  previous  chapters.  For  instance,  in  Chap.  12 
we  discussed  simple  quality  tests  for  random  number  generators,  in  Chap.  15  we 
calculated  the  errors  associated  with  the  observables  of  the  Ising  model.  Here,  these 
simple  methods  will  be  summarized  and  some  more  advanced  techniques  will  be 
introduced  on  a  basic  level.  For  a  more  advanced  discussion  of  this  topic  we  refer 
the  interested  reader  to  Refs.  [1-5]. 


19.2  Calculation  of  Errors 

We  repeat  briefly  the  basics  of  simple  estimators  which  we  made  use  of  previously. 
We  approximate  the  expectation  value  (x)  of  some  variable  v 


(19.1) 


where  p(x)  is  a  pdf,  by  its  arithmetic  mean 


(19.2) 
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where  the  numbers  x,-  follow  the  distribution  p(x).  It  is  of  conceptual  importance 
to  distinguish  between  the  expectation  value  (x)  which  is  a  c-number,  while  the 
estimator  x  is  a  random  number  fluctuating  around  (x) .  The  error  of  approximating 
(x)  by  x  can  be  estimated  by  calculating  the  variance 


var (x)  = 


var (x) 
N 


(x2>  -  (x)2 


(19.3) 


if  the  random  numbers  x*  are  uncorrelated  (see  Appendix  E).  In  case  of  correlated 
data  the  treatment  becomes  more  involved  and  this  will  be  discussed  in  Sect.  19.3. 
The  expectation  values  (x2)  and  (x)  in  Eq.  (19.3)  may  again  be  replaced  by  the 

corresponding  estimators  x2  and  x  in  order  to  obtain  a  reasonable  estimate  of  the 
variance  var  (x).  In  particular,  we  approximate 


(19.4) 


This  approximation  has  already  been  applied  in  our  investigation  of  the  ISING 
model,  Chap.  15.  When  dealing  with  MARKOV-chain  Monte  Carlo  simulations,  the 
result  (19.3)  can  be  interpreted  in  a  rather  trivial  way:  Repeating  the  simulation 
under  identical  conditions  results  in  roughly  68  %  of  all  simulations  to  yield  a  mean 
value  x  G  [x  —  Oj,  x  T  Oj],  where  <jj  =  y/ var  (x)  is  the  standard  error. 

We  consider  now  the,  in  the  meanwhile,  quite  familiar  situation  in  which  the 
underlying  pdf  p(x)  of  a  sequence  of  random  numbers  {x;}  is  unknown.  In  such  a 
case  one  cannot  simply  use  a  particular  estimator  without  some  knowledge  of  the 
particular  form  of  p(x).  A  common  way  to  proceed  is  the  poor  person  ’s  assumption : 
The  underlying  distribution  is  symmetric.  This  assumption  has  its  origin  in  the 
central  limit  theorem  (see  Appendix,  Sect.  E. 8).  However,  some  intuitive  checks 
may  be  required  if  fatal  misconceptions  are  to  be  avoided.  Is  the  data  set  reasonably 
large  one  can  retrieve  essential  information  from  collecting  the  data  points  in  form 
of  a  histogram  or,  if  the  index  i  refers  to  time  instances,  by  plotting  a  time  sequence. 

We  can  deduce  a  first  idea  about  the  form  of  the  underlying  pdf  from  a  histogram. 
For  instance,  if  the  data  set  displays  only  one  peak,  as  in  Fig.  19.1,  quantities  like 
the  mean  or  the  variance  could  be  useful.  But  if  there  are  two  (or  more)  separate 
peaks,  as  in  Fig.  19.2,  it  does  not  necessarily  make  sense  to  calculate  the  mean  or 
variance  by  summing  over  all  the  data  points.  Such  a  situation  can,  for  instance, 
occur  in  statistical  spin  models,  with  two  phases,  as  we  observed  it  in  the  g-state 
Potts  model,  Fig.  18.7a,  b. 

Time  series,  in  which  the  data  points  x*  are  plotted  as  a  function  of  discrete 
time  instances  6  ,  can  also  reveal  important  information  about  the  properties  of  the 
data  set.  For  instance,  systematic  trends,  outliers,  or  hints  for  correlations  may  be 
observed. 
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Fig.  19.1  Histogram  generated  by  random  sampling  of  a  Gaussian  of  mean  zero  and  variance  one 


Fig.  19.2  Histogram  generated  by  random  sampling  of  two  Gaussians  of  mean  zero  and  variance 
one,  displaced  by  +3  and  —3,  respectively 


Let  us  turn  our  attention  to  some  more  advanced  estimator  techniques.  So  far 
we  discussed  the  sample  mean  and  sample  variance  as  candidates  for  unbiased 
estimators.  In  a  more  general  context  the  calculation  of  observables  from  data  sets 
might  be  more  complex.  In  the  following  we  assume  a  data  set  of  N  data  points 
(xi,X2,...  ,xn).  Basically,  we  would  like  to  estimate  a  quantity  of  the  form/((v)) 


1  Since  mean  and  variance  are  calculated  from  the  same  data  points,  they  are  usually  not  unbiased. 
Therefore  a  common  choice  is  the  so  called  bias  corrected  variance  var(x)fi  which  is  given  by 
var  (x)B  =  T^-j-var  (J)  where  N  is  the  number  of  data  points.  A  more  detailed  discussion  can  be 
found  in  any  textbook  on  statistics  [6-9]. 


314 


19  Data  Analysis 


2 

where  /  is  some  particular  function  (for  instance  (x)  ).  A  bad  ( biased )  estimate 
would  be  to  calculate 


/  =  f  £/(*,-)  •  (19.5) 

which  is  definitely  not  the  quantity  we  are  interested  in  because  for  A  ->  oo  we 
have/  ->  (/}  and  not /((*)).  A  better  estimate  would  be  to  calculate 

/(i)=/(v^A)  ’  (19-6) 

which  converges  to  /((*))  for  A  ->  oo.  We  discuss  here  two  different  methods  to 
calculate  the  error  attached  to /(x),  namely  the  Jackknife  method  and  the  statistical 
bootstrap  method. 

We  define  Jackknife  averages 


1 


A  —  1 


Ex/  - 

j^i 


(19.7) 


and  is  the  average  of  all  values  xj  ^  x\.  Moreover,  we  define 


and  this  opens  the  possibility  to  estimate /((x))  following 

/«*»  * 

i 


with  the  statistical  error 


(IV- 1) 


(. PY-  tf?  , 


which  can  be  written  as 


A  —  1 
A 


E<A  /'  > 


(19.8) 


(19.9) 


(19.10) 


(19.11) 


for  uncorrelated /7  (see  Appendix  E). 

In  the  case  of  the  statistical  bootstrap  we  consider  again  a  set  of  A  data- 
points  {x;}.  We  randomly  choose  A  elements  from  this  data  set  without  removal 
which  constitutes  the  set  {xj^ }  and  calculate  for  these  A  points  the  observable 

f  =  /(I /NjfjXjl)).  This  procedure  is  repeated  M-times  and  we  get 


/((*))  %/bs  =  ^  ’ 

i 


(19.12) 


19.3  Auto-Correlations 


315 


and 


r.  (ft  / bs)  • 


(19.13) 


This  method  was  applied  in  Chap.  15  to  determine  estimates  for  the  error-bars 
of  the  various  observables  as  a  function  of  temperature  in  Fig.  15.6.  The  methods 
discussed  here  can,  of  course,  also  be  employed  to  derive  estimates  for  the  errors 
attached  to  the  various  observables  studied  in  the  Potts  model,  Chap.  18. 

Let  us  close  this  section  with  a  short  comment  on  systematic  errors.  As  already 
highlighted  within  Chap.  1  one  also  has  to  be  aware  of  possible  systematic  errors. 
Like  in  experimental  data,  these  errors  are  more  easily  overlooked  in  numerical  data 
since  they  are  rather  hard  to  identify.  In  general,  there  is  no  method  available  to 
investigate  systematic  errors.  For  instance,  in  the  simulation  of  the  ISING  model,  the 
main  source  of  errors  was  that  the  MARKOV-chain  was  not  allowed  to  completely 
equilibrate  which  would  have  been  equivalent  to  running  the  simulation  forever. 
The  introduction  of  the  concept  of  an  auto-correlation  time  will,  at  least,  allow  for  a 
systematic  investigation  of  this  fundamental  problem. 


19.3  Auto-Correlations 

The  situation  becomes  more  involved  whenever  the  random  numbers  of  the 
sequence  {x;}  are  correlated,  i.e.  c ov(x;,x/)  ^  0  for  i  ^  j  [see  Appendix, 
Eq.  (E.16)],  where  the  elements  of  the  series  {x;}  are  successive  members  of  a 
time  series.  Hence,  existing  covariances  between  elements  x*  and  Xj  account  for 
auto-correlations  of  a  certain  observable  between  different  time  steps.  We  rewrite 
Eq.  (19.3): 


i=  1 


(19.14) 
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The  first  term  on  the  right-hand  side  of  Eq.  (19. 14)  is  identified  as  var  (xi)  /TV  which 
is  assumed  to  be  identical  for  all  i,  i.e.  var(v;)  =  var(v).  Furthermore,  we  rewrite 
the  sum 


N  N 

E-^EE- 

ij^j  i—  1  j—i+ 1 

and  obtain 


var  (x)  =  — 
v  7  TV 


N  N 


vai'  (.v)  +  —  E  E  cov  (x>’xi) 


i=  1  j=i- 1-1 


Let  us  assume  time  translational  invariance: 


(19.15) 


cov  (xi,xj)  =  C(j  -  i )  ,  fory  >  i. 
We  apply  this  relation  to  Eq.  (19.15)  and  obtain 


(19.16) 


var (x)  =  — 


1 

N 


N  N 


var 


w  +  I]  c0’-o 


1 

TV 

1 

TV 


which  can  be  reformulated  as: 


/=  1  7=/+l 


N 


var  (x)  +  ^  E  C(A:)  (N  - k) 


k=  1 


var  (x)  +  2XT  C(&)  (l  -  ^ 

k=  1  ' 


(19.17) 


var (x)  = 


2var  (x)  r/ 

'  TV  " 


(19.18) 


We  introduced  here  the  (proper)  integrated  auto -correlation  time  xlx 


^-\  +  Ea®  (l  -  v)  ’ 

k —  1 


(19.19) 


and  the  normalized  auto-correlation  function 


A(k)  = 


C(k)  _  cov  (xj,  Xj+k) 
C(0)  var  fa) 


(19.20) 
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In  most  cases  we  are  interested  in  the  limit  N  ->  oo  of  Eq.  (19.19): 

j  oo 

rjc  =  Jim  rj  =  -  +  J2A(k>  •  09.21) 

W->oo  2 

k=  1 

The  form  of  the  auto-correlation  function  A(v)  can  be  approximated  using  the 
results  of  Sect.  16.4.  There,  we  observed  that  the  stationary  distribution  tt  was  the 
left-eigenvector  of  the  transition  matrix  P  with  eigenvalue  1,  Eq.  (16.73).  Let  {(pi) 
denote  the  set  of  all  left-eigenvectors  of  the  matrix  P  with  eigenvalues  A i,  i.e.  cpiP  — 
Xicpi.2  Then  some  arbitrary  state  q(0)  can  be  expressed  in  this  basis  as: 

q( 0)  =  F,  UiVi  •  (19.22) 

i 

After  n  consecutive  time- steps  we  arrive  at  state  q(n ) 

q(n)  =  q( 0)P"  =  ^a^iP"  =  ,  (19.23) 

i  i 

which  follows  from  Eq.  (16.62).  We  denote  the  observable  we  want  to  calculate  by 
0(n )  and  expand  it  according  to  Ref.  [10] 

O(n)  =  T>(n)],o,  =  tiXfot  ,  (19.24) 

i  i 

where  Oi  stands  for  the  expectation  value  of  O  in  the  z-th  eigenstate  (pi .  For  large  n 
the  value  of  0(n)  will  be  dominated  by  the  largest  eigenvalue  of  P ,  say  Ao,  and  we 
denote  this  value  by  0(o o)  =  afo^o-  This  allows  us  to  rewrite  Eq.  (19.24)  as 

0(n)  =  O(oo)  +  J2  •  (19.25) 

i^O 

Let  X\  e  R  be  the  second  largest  eigenvalue  and  let  us  define  the  exponential  auto¬ 
correlation  time  xl  via 


log(Ai)  ' 


(19.26) 


2Note  that  since  P  is  a  stochastic  matrix,  it  follows  that  |A^|  <  1  for  all  i.  Furthermore,  it  can  be 
shown  that  the  largest  eigenvalue  of  a  stochastic  matrix  is  equal  to  1 . 


318 


19  Data  Analysis 


and  the  value  of  0(n )  can,  for  large  values  of  n,  be  approximated  by 

0(n )  ^  O(co)  +  j8  exp  ^ ,  (19.27) 

where  /3  is  some  constant.  Hence,  the  auto-correlation  obeys 

C(n)  oc  [0(0)  —  0(oo)]  [ 0(n )  —  0(oo)]  oc  /3  exp  ^  — ^  ,  (19.28) 

and  we  can  simply  set  for  the  auto-correlation  function  A  (k) 

A(k)  =  y  exp  -  (19.29) 

where  y  is  some  constant.  We  use  this  result  in  the  expression  for  the  integrated 
auto-correlation  time  (19.21)  and  arrive  at: 


1  oo  - 


exp|-2 

k=l  L  V  * 


1 


exp 


B) 


2  ’'l-exp(-i,) 


(19.30) 


For  xex  1  the  exponential  function  can  be  expanded  into  a  Taylor  series. 
Keeping  terms  up  to  first  order  results  in: 


1  1-i  1 

*i  =  ^  +  y— r^  =  ~+y«-i)«K-  d9.3i) 

However,  we  note  that  in  general  relation  (19.31)  is  only  a  poor  approximation 
because  usually  the  exponential  auto-correlation  time  is  very  different  from  the 
integrated  auto-correlation  time. 

Let  us  briefly  discuss  our  results.  A  comparison  between  Eqs.  (19.3)  and  (19.18) 
reveals  that  due  to  correlations  in  the  time  series,  the  number  of  effective  (or  useful) 
data  points  Ne ff  can  be  determined  from 


N 

H  ' 


(19.32) 


In  the  limit  xex  — >►  0  we  obtain  xx  =1/2  and,  thus,  recover  Eq.  (19.3).  The  effective 
number  of  measurements  is  the  relevant  quantity  whenever  the  error  of  a  Monte 
Carlo  integration  is  calculated. 
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In  another  approach,  one  can  determine  the  exponential  auto-correlation  time  xex 
and  use  it  to  estimate  the  number  of  steps  that  should  be  neglected  between  two 
successive  measurements.  This  can  be  achieved  by  fitting  the  auto-correlation  A (k) 
with  an  exponential  function.  (A  brief  introduction  to  least  squares  fits  can  be  found 
in  Appendix  H.)  We  note  that  in  one  and  the  same  system  the  auto-correlation  times 
may  be  very  different  for  different  observables. 


19.4  The  Histogram  Technique 

The  histogram  technique  is  a  method  which  allows  to  approximate  the  expectation 
value  of  some  observable  for  temperatures  near  a  given  temperature  T0  without 
performing  further  MARKOV-chain  Monte  Carlo  simulations.  The  basic  idea  is 
easily  sketched.  Suppose  the  observable  O  is  solely  a  function  of  energy  E.  We 
perform  a  MARKOV-chain  Monte  Carlo  simulation  for  a  given  temperature  To  and 
measure  the  energy  E  several  times.  The  resulting  measurements  are  sorted  in  a 
histogram  with  bin  width  AE  as  was  demonstrated  in  Sect.  18.3.  If  n(E)  denotes 
the  number  of  configurations  measured  within  the  interval  (E,E  +  AE ),  then  the 
probability  that  some  energy  is  measured  to  lay  within  the  interval  ( E ,  E  +  AE)  is 
given  by 


Ph(E,T0) 


(19.33) 


where  the  index  TT  refers  to  histogram  and  M  =  J2En(E)  is  the  number  of 
measurements.  However,  we  note  that  this  probability  can  also  be  expressed  by  the 
Boltzmann  distribution 


yV(£)exp(-^j 

P(E,  T)  =  - V  .  7  ,  (19.34) 

EiA(£)exp(-_^J 

where  N(E)  denotes  the  number  of  micro-states  within  the  interval  {E,  E  +  AE). 
N(E)  is  independent  of  the  temperature  T  and  relation  (19.34)  is  valid  for  all 
temperatures  T.  In  particular,  for  T  —  To 


Ph(E ,  To)  =  P(E ,  To)  , 


(19.35) 


which  immediately  yields 


E 

feTo 


N(E)  —  a n(E)  exp 


9 


(19.36) 
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where  a  is  some  constant  and  we  emphasize  that  n(E )  was  measured  at  Tq.  Inserting 
Eq.  (19.36)  into  (19.34)  yields 


P(E,  T ) 


n(E ) exp  - 
J2En(E)ex  p 


(19.37) 


for  arbitrary  T.  The  expectation  value  (0)T  of  the  observable  O  at  some  temperature 
T  can  now  be  determined  from 


(0)T  =  E  0(E)P(E,  T ) 

E 


J2e  0(E)n(E)  exp 

(  1  >  ) 

\  &b T  k^To  J 

E 

£En(£)exp 

— 

(  1  1  ^  e 

\kBT  £B7o/ 

(19.38) 


This  result  implies,  that  it  is  not  necessary  to  run  an  additional  MARKOV-chain 
Monte  Carlo  simulation  in  an  attempt  to  compute  the  expectation  value  ( 0)T  for 
temperature  T  if  T  is  in  the  vicinity  of  Tq.  However,  if  T  deviates  strongly  from 
To,  the  above  procedure  (19.38)  does  not  provide  a  good  approximation  because 
the  relevant  configurations  at  T  may  have  been  very  improbable  at  To  and  may, 
therefore,  not  have  been  reproduced  sufficiently  often  in  the  original  MARKOV- 
chain  Monte  Carlo  simulation. 


Summary 

Data  analysis  is  an  important  but  often  neglected  part  of  natural  sciences  and 
in  particular  of  numerical  simulations.  It  consists  mainly  of  consistency  checks 
and  error  analysis.  This  chapter  concentrated  in  a  first  step  on  error  analysis.  It 
discussed  the  most  common  methods  to  arrive  at  an  estimate  of  the  error  involved 
whenever  expectation  values  of  some  property  are  analyzed.  These  went  beyond 
all  those  methods  which  have  already  been  discussed  in  some  detail  throughout 
this  book.  In  a  second  step  auto-correlations  have  been  discussed.  They  should  be 
part  of  consistency  checks  and  give  valuable  information  about  possible  systematic 
errors.  The  auto-correlation  analysis  was  of  particular  importance  whenever  the 
quality  of  the  sequence  of  random  numbers  was  crucial  to  a  particular  simulation. 
(Experiments  in  which  the  events  are  expected  to  be  random,  like  radioactive  decay, 
fall  also  into  this  category.)  Nevertheless,  this  method  proved  to  be  very  useful  in 
MARKOV-chain  Monte  Carlo  simulations  as  it  allowed  to  define  and  determine  an 
auto-correlation  time  which  could  serve  as  a  measure  of  the  number  of  sweeps  which 
have  to  be  neglected  between  two  consecutive  measurements.  Finally,  the  histogram 
technique  was  introduced  as  a  method  of  data  interpolation.  It  allowed,  in  addition 
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to  applications  which  have  already  been  presented  within  this  book,  to  derive  the 
expectation  value  of  some  property  at  some  ‘temperature’  T  from  the  already  known 
expectation  value  of  this  same  property  at  some  other  temperature  Tq  if  T  ~  To  and 
if  the  equilibrium  distribution  was  known. 


Problems 

1 .  Calculate  the  auto-correlation  function  for  random  numbers  generated  by  the  two 
linear  congruential  generators  discussed  in  Sect.  12.2.  Check  also  the  random 
number  generator  provided  by  your  system.  Discuss  the  results. 

2.  Potts  model:  Calculate  the  error  attached  to  the  specific  heat  Ch  and  the 
susceptibility  /  using  the  Jackknife  method  for  all  values  of  q  —  1, . . . ,  8.  Plot 
the  corresponding  diagrams  and  discuss  the  results.  Determine  the  exponential 
and  integrated  correlation  time. 
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Chapter  20 

Stochastic  Optimization 


20.1  Introduction 

Suppose  i  g  S  is  some  vector  in  an  ^-dimensional  search  space  §  and  let  H  : 
§  R  be  a  mapping  from  the  search  space  §  onto  the  real  axis  R.  The  function  El 
plays  a  particular  role  and  is  usually  referred  to  as  the  cost  function.  A  minimization 
problem  can  be  defined  in  a  very  compact  form: 

Find  xo  E  §,  such  that  M(jto)  is  the  global  minimum  of  the  cost  function  M. 

In  analogue,  a  maximization  problem  with  cost  function  El  defines  a  minimiza¬ 
tion  problem  with  cost  function  G  =  —El.  The  class  of  both  problems  is  referred  to 
as  the  class  of  optimization  problems  [1-3]  and  only  minimization  problems  will  be 
discussed  here. 

The  reader  might  be  aware  that  there  are  numerous  applications  in  physics  and 
related  sciences.  We  list  a  few  in  order  to  remind  ourselves  of  their  fundamental 
importance: 

•  The  set  of  linear  equations  Ax  =  b  is  often  regarded  as  the  minimization  problem: 
EI(x)  =  || Ax  —  b\\ 2  which  can  be  beneficial  for  high  dimensional  problems. 

•  The  quantum  mechanical  ground  state  energy  Eo  is  given  by 

E0  =  min  V  ,  '  7  ,  (20.1) 

v  (V\V) 

where  y&)  denotes  the  wave  function  and  H  is  the  Hamiltonian  of  the  system. 

•  High  dimensional  and  highly  non-linear  least  squares  fits.  (More  details  can  be 
found  in  Appendix  H.) 

•  The  equilibrium  crystal  structure  of  solids  is  obtained  by  minimization  of  the  free 
energy. 

•  Protein  folding  is  described  by  minimization  of  the  forces  in  a  molecular 
dynamics  problem. 
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Whenever  the  cost  function  is  at  least  once  differentiable,  methods  of  deter¬ 
ministic  optimization  can  be  applied  [4].  (Two  simple  deterministic  optimization 
methods  are  presented  in  Appendix  I.)  On  the  other  hand,  if  H  is  not  differentiable 
or  too  complex,  due  to  a  huge  search  space  §  or  many  local  minima,  methods  of 
stochastic  optimization  [5]  can  be  employed.  The  term  stochastic  optimization  is 
used  for  methods  which  contain  at  least  one  step  which  is  based  on  random  number 
generation.  Let  us  briefly  give  some  examples  of  problems  for  which  deterministic 
methods  fail: 

•  The  Traveling  Salesperson  Problem  [6,  7]:  A  traveling  salesperson  has  to  visit  L 
cities  in  a  tour  as  short  as  possible  under  the  constraint  that  he/she  has  to  return 
to  the  starting  point  in  the  end.  Each  city  has  to  be  visited  only  once,  hence  the 
cities  have  to  be  ordered  in  such  a  way  that  the  travel  length  becomes  a  global 
minimum.  In  particular,  the  cost  function 


L 

H({i'})  =  E  lXi<+i  -  Xii 
1= 1 


(20.2) 


has  to  be  minimized.  Here  {/}  denotes  a  certain  configuration  of  cities  and  we  set 
z‘l+i  =  i\ .  Obviously,  we  cannot  calculate  the  first  derivative  of  M  with  respect 
to  {/},  set  it  zero,  and  solve  the  problem  in  the  classical  way.  On  the  other  hand, 
a  brute  force  approach  of  calculating  H({/})  for  all  possible  arrangements  {/}  is 
not  possible  since  we  have  L\  different  possible  routes.  Since  for  one  particular 
choice  all  L  starting  points  and  both  travel  directions  yield  the  same  result,  we 
have  to  calculate  L!/(2L)  =  (L  —  l)!/2  different  configurations  {/}.  We  would 
have  about  10 155  different  choices  for  L  —  100  cities!  This  clearly  makes  such 
an  approach  intractable. 

•  The  arrangement  of  timetables  under  certain  constraints.  In  particular,  the  design 
of  timetables  in  schools,  universities  or  at  airports.  This  problem  is  also  referred 
to  as  the  Nurse  Scheduling  Problem  [8]. 

•  The  ISING  spin  glass  [9]:  In  contrast  to  the  classical  ISING  model,  the  ISING  spin 
glass  is  characterized  by  nearest  neighbor  interactions  Jy  which  are,  in  the  most 
simple  case,  chosen  to  be  Jy  —  +1  and  Jy  —  —1  with  the  same  probability. 
In  this  case  the  ground  state  below  the  critical  temperature  is  not  simply  given 
by  a  configuration  in  which  all  spins  point  in  the  same  direction.  Of  course,  the 
ground  state  configuration  in  such  a  case  can  be  highly  degenerate.  The  fact  that 
such  a  model  can  be  simulated  using  MARKOV-chain  Monte  Carlo  methods  as 
they  have  been  discussed  within  Chaps.  15  and  18  gives  us  some  idea  of  how  one 
may  employ  stochastic  methods  to  solve  optimization  problems. 

•  The  N- Queens  Problem  [10]:  Place  N  queens  on  a  NxN  chessboard  in  such  a  way 
that  no  two  queens  attack  each  other.  In  particular,  this  means  that  two  queens  are 
not  allowed  to  share  the  same  row,  the  same  column,  and  the  same  diagonal.  It 
can  be  shown  that  the  problem  possesses  solutions  for  A  >  4.  One  defines  a 
function  M({^})  which  counts  the  number  of  attacks  in  a  certain  configuration 
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{n}.  For  instance,  for  N  —  4,  the  configuration 


4 

3 

2 


has  =  2.  On  the  contrary,  the  configuration 


4 

3 

2 


solves  the  4-queens  problem  and  H({n})  =  0. 

We  concentrate  here  on  some  of  the  most  basic  methods  of  stochastic  optimiza¬ 
tion:  the  method  of  hill  climbing ,  the  method  of  simulated  annealing ,  and  genetic 
algorithms.  Ideas  on  which  several  more  advanced  techniques  are  based  will  be 
sketched  in  Sect.  20.5. 


20.2  Hill  Climbing 

The  method  of  hill  climbing  [11]  is  probably  one  of  the  most  simple  methods 
of  stochastic  optimization.  Given  a  cost  function  H(jc),  we  execute  the  following 
steps: 

1 .  Choose  an  initial  position  x0. 

2.  Randomly  pick  a  new  xn  from  the  neighborhood  of  xn-\ . 

3.  Keep  xn  if  M(v„)  <  H(jcn_i). 

4.  Terminate  the  search  if  no  new  xn  can  be  found  in  the  neighborhood  of  xn-\ . 
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We  note  that  the  algorithm  requires  a  neighborhood  relation.  This  relation 
is  to  be  defined  for  each  particular  problem.  For  instance,  in  the  case  of  the 
traveling  salesperson  problem  it  is  by  no  means  clear  what  a  configuration  in  the 
neighborhood  of  a  certain  route  {/}  should  mean.  To  elaborate  on  this  problem  we 
concentrate  here  on  two  particular  problems  which  help  to  demonstrate  how  such  a 
neighborhood  relation  can  be  defined. 

In  the  traveling  salesperson  problem  or  in  the  ISING  spin  glass  model  the 
neighborhood  of  a  route  {/}  or  of  a  configuration  ^  can  be  defined  as  the  set 
of  all  routes  {/}  in  which  two  cities  have  been  interchanged  or  as  the  set  of  all 
configurations  ^  in  which  one  spin  has  been  flipped. 

On  the  other  hand,  if  the  search  space  §  =  W1  we  may  define  the  neighborhood 
as  the  number  of  points  within  an  ^-sphere  of  radius  r  centered  at  z  =  xn-\.  It  is 
rather  simple  to  sample  points  from  an  n- sphere  centered  at  the  origin  by  applying 
the  method  of  G.  Marsaglia  [12]:  For  an  ^-dimensional  vector  we  sample  all 
components  x\, ...  ,xn  from  the  normal  distribution  ,/F(0, 1)  with  mean  zero  and 
variance  one.  The  points  are  then  transformed  according  to 


XJ 


xj  = 


X 


Xj  +  Zj  , 


(20.3) 


where  \\x\\  denotes  the  Euclidean  norm  of  the  vector  v.  The  points  given  by 
Eq.  (20.3)  lie  on  the  surface  of  the  ft- sphere  with  radius  r.  In  order  to  obtain 
uniformly  distributed  random  points  within  a  sphere  with  radius  r  we  draw  a  random 
number  u  e  [ 0,1]  and  calculate 


Xi 


i 

ft  n  Xj  , 


(20.4) 


where  the  factor  1  /ft  in  the  exponent  of  u  ensures  that  the  points  are  uniformly 
distributed. 

Let  us  briefly  summarize  the  most  important  properties  of  the  method  of  hill 
climbing: 

•  The  way  the  algorithm  is  defined  it  will  terminate  in  a  local  minimum  and  not 
in  the  global  minimum.  A  classical  remedy  is  the  restart  of  the  algorithm  from 
various  different  initial  positions.  Information  gathered  from  previous  runs  can 
help  to  make  a  good  choice  for  the  initial  positions  of  restarts. 

•  It  depends  highly  on  the  choice  of  initial  conditions  if  and  how  the  global 
minimum  is  found.  This  situation  is  very  similar  to  the  application  of  determin¬ 
istic  methods  of  optimization  (see  Appendix  I).  Sometimes  it  may  even  be  of 
advantage  to  accept  points  which  result  in  a  slight  increase  of  the  cost  function’s 
value  just  to  escape  a  local  minimum. 

•  For  most  problems  this  method  is  very  expensive  from  a  computational  point  of 
view. 

We  apply  the  method  of  hill  climbing  to  the  A-queens  problem  for  A  =  8.  The 
algorithm  is  executed  in  the  following  way:  In  the  initial  configuration  the  queens 
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are  set  randomly  on  the  chessboard  and  we  place  only  one  queen  in  each  row  and 
column.  It  is  then  checked  whether  or  not  two  queens  attack  each  other.  If  they  do, 
a  new  configuration  is  generated  by  picking  two  queens  at  random  and  by  changing 
their  respective  positions.  This  is  repeated  until  a  configuration  arises  in  which  none 
of  the  queens  attacks  another.  Such  an  algorithm  resembles  a  random  walk  in  a 
parameter  space  which  spans  all  possible  configurations  under  the  constraint  that 
only  one  queen  is  placed  in  each  row  and  column.  The  iteration  is  terminated  as  soon 
as  no  queen  is  attacked  by  any  other  queen.  It  is  rather  obvious  that  this  strategy  is 
not  very  fast,  however,  one  possible  solution  to  the  problem  for  N  =  8  can  easily  be 
found  within  a  few  iteration  steps: 


However,  for  large  values  of  N  hill  climbing  is  definitely  not  a  recommendable 
method  to  solve  the  ALqueens  problem. 


20.3  Simulated  Annealing 


Let  us  turn  our  attention  to  simulated  annealing  [7,  13,  14].  The  name  of  this 
algorithm  stems  from  the  annealing  process  in  metallurgy  in  which  a  metal  is 
first  heated  and  then  slowly  cooled  in  order  to  reduce  the  amount  of  defects  in 
the  material.  The  reasoning  behind  this  method  can  quite  easily  be  reconstructed 
with  the  help  of  the  ISING  model  which  we  discussed  in  detail  in  Chap.  15.  There 
we  learned  from  thermodynamics  that  the  equilibrium  distribution  of  possible 
configurations  P(ff,  T )  at  a  certain  temperature  T  is  a  BOLTZMANN  distribution 


P(tf,  T)  =  3  exp 


gcg) 

kBT 


(20.5) 


where  H(¥?)  is  the  Hamilton  function  of  the  system.  In  particular,  we  expect  that 
the  system  is  in  its  ground  state  (let  us  assume  a  non-degenerate  ground-state  for 
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the  time  being)  with  probability  one  in  the  limit  T  ->  0,  provided  that  we  cooled 
sufficiently  slowly  so  that  the  system  had  enough  time  to  equilibrate.  This  can  be 
used  to  solve  the  optimization  problem:  We  take  the  cost  function  H(x)  and  define 
the  probability  for  the  realization  of  a  particular  state  (configuration)  in  the  search 
space  xq  e  §  by 


(20.6) 

where  T  is  some  external  parameter,  which  we  refer  to  as  temperature  for  reasons 
of  convenience,  and  Z  denotes  the  normalization  constant: 

(20.7) 

We  start  the  procedure  at  some  finite  initial  temperature  Tq  ^  0  and  construct 
a  MARKOV-chain  of  states  {xn}  which  converges  towards  the  distribution  (20.6). 
We  choose,  of  course,  a  sampling  technique  which  does  not  require  the  explicit 
knowledge  of  the  normalization  Z,  such  as  the  METROPOLIS -HAS TINGS  algorithm 
of  Sect.  18.2.  As  soon  as  the  MARKOV-chain  reaches  its  stationary  distribution  for  a 
given  temperature  T,  we  slightly  decrease  the  temperature  and  restart  the  MARKOV- 
chain  with  the  last  state  of  the  previous  temperature.  By  slowly  cooling  the  search 
MARKOV-chain,  we  exclude  unimportant  parts  of  the  search  space  by  decreasing 
their  acceptance  probability.  Nevertheless,  the  chain  is  given  enough  time  to  explore 
the  whole  remaining  search  space  at  each  temperature.  This  procedure  is  commonly 
referred  to  as  the  classical  version  of  simulated  annealing . 

It  is  of  advantage  to  start  with  an  initial  temperature  which  allows  to  cover  the 
largest  part  of  possible  states  in  the  search  space  §.  Thus,  the  acceptance  probability 
for  a  new  state  in  the  MARKOV-chain  is  almost  equal  to  one  for  all  x  e  §.  If  this 
were  not  the  case,  some  regions  of  the  search  space  might  be  excluded  from  our 
search  routine  right  away  due  to  an  unlucky  choice  of  the  initial  configuration.  In 
particular,  the  result  might  be  a  state  in  the  neighborhood  of  the  initial  state  of  the 
MARKOV-chain  and  it  is,  therefore,  most  likely  a  local  minimum  rather  than  the 
global  minimum. 

We  note  that  the  algorithm  consists  of  the  following  essential  ingredients:  (i)  a 
proposal  probability  for  new  states  v  within  the  search  space  S,  (ii)  an  acceptance 
probability  Pa(x  ->  xr)  for  a  proposed  xr  from  a  previous  state  x,  and  (iii)  a  cooling 
strategy  T  =  T(t ),  where  t  is  time.  Let  us  briefly  elaborate  on  these  points. 


P(x o,  T)  =  2  exp 


gCxp) 

T 


(i)  Proposal  Probability 

The  question  of  how  to  generate  new  states  a  from  a  previous  state  x'  within  the 
search  space  §  has  already  been  answered  in  the  case  of  hill  climbing,  Sect.  20.2  by 
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defining  the  neighborhood  of  a  state  in  search  space.  The  corresponding  proposal 
probability  will  be  denoted  by  Pp(x  ->  x'). 


(ii)  Acceptance  of  Probability 


The  acceptance  probability  has  to  be  chosen  in  such  a  way  that  the  sequence  of 
generated  states  constitutes  a  MARKOV-chain  which  converges  toward  the  distribu¬ 
tion  (20.6).  Hence,  detailed  balance  has  to  be  imposed  and  the  implications  of  this 
requirement  have  been  discussed  extensively  in  Chap.  18.  Note  that  the  proposal 
probability  has  to  be  included  into  the  definition  of  the  acceptance  probability  as 
was  outlined  in  Sect.  18.2. 

One  particular  choice  of  a  Metropolis -Hastings  acceptance  probability 


x' ,  T)  —  min 


P(x',T)Pp(x'  ^x)\ 
P(x,T )  Pp(x^x')J  9 


(20.8) 


appears  to  be  quite  natural  for  several  reasons: 

•  It  is  very  general  and  can,  thus,  also  handle  asymmetric  proposal  probabilities. 

•  In  the  symmetric  case  Pp(x  ->  x' )  =  Pp(x'  ->  x)  and  H(x')  <  H(x)  we  get 


Pp*xj)  =  exp  {  f  tH(x)  ~~  H(A]}  -  1  ’ 


(20.9) 


according  to  our  choice  (20.6)  and  the  state  x'  is  accepted  with  probability  one. 
On  the  other  hand,  for  H(x')  >  H(x),  x'  may  still  be  accepted  with  some 
finite  probability  Pa(x  ->  x',T)  which  offers  an  opportunity  to  escape  a  local 
minimum. 


(iii)  Cooling  Strategy 

The  design  of  a  proper  cooling  strategy  includes  both,  the  choice  of  an  appropriate 
initial  temperature  Tq  as  well  as  the  formulation  of  a  mathematical  rule  which 
defines  Tn+X  =f(Tn )  where  Tn+X  <  Tn. 

First  of  all  we  discuss  the  choice  of  the  initial  temperature.  A  common  choice 
is  to  choose  it  in  such  a  way  that  at  least  80  %  of  all  generated  states  are  accepted. 
The  simplest  procedure  to  determine  this  temperature  starts  with  some  arbitrary 
value  To  >  0  and  generates  N  states.  If  the  number  of  rejected  states  Nr  is  greater 
than  0.2  A,  then  the  temperature  To  is  doubled  and  the  number  of  rejected  states  is 
measured  again. 
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Another  more  sophisticated  choice  is  based  on  the  following  idea:  The  best 
choice  would  be  T0  — >  oo  because  then  the  acceptance  probability  would  be  one  for 
all  possible  states  independent  of  H(x).  This  corresponds  to  a  random  walk  in  search 
space  S  and  we  calculate  the  mean  value  (H)^  and  the  variance  var(M)00.  Thus, 
the  function  values  HI  fluctuate  between  [(H)  ^  —  y/var  (H)^,  (H)^  +  y/var  (H)^]. 
We  consider  now  the  expectation  value  (H)r  for  large  values  of  7q.  We  define  the 
small  parameter  c  —  1/Tq  1  and  find  with  p(x,  c)  —  P(x,  T ) 


d xp(x,  e)H(x) 


-  P)o  -  e 


(e2)0  -  (H)o 


Re- substituting  To  —  l/e  results,  finally,  in: 


P)ro  *  <H)oo  - 


var  (H) 


OO 


T 


o 


(20.10) 


(20.11) 


The  initial  temperature  To  is  now  chosen  in  such  a  way  that  the  expectation 
value  (H)r  borders  the  infinite  temperature  fluctuations  from  below  and  we  set 
consequently 


(M)To  =  (H}^  -  VvarCH)^  ,  (20.12) 


with  the  implication  that 


T0  =  VvarCH)^  .  (20.13) 

We  are  now  in  a  position  to  investigate  appropriate  cooling  strategies:  The 
geometric  cooling  schedule 


Tn  =  Toqn  , 


(20.14) 


with  0  q  <  1  is  very  often  used.  However,  particular  cost  functions  H(v)  may 
develop  several  phase  transitions  in  the  course  of  the  cooling  process.  Naturally,  the 
expectation  value  (H)  changes  rapidly  in  the  region  T  w  Tc,  with  Tc  the  temperature 
at  which  the  phase  transition  occurs.  It  is,  therefore,  certainly  of  advantage  to  take 
such  a  possibility  into  account  and  to  design  the  cooling  strategy  accordingly. 

Hence,  a  more  appropriate  strategy  is  to  use  temperature  changes  which  cause 
only  slightly  modified  acceptance  probabilities.  In  particular,  we  demand  that 


1  <  P(x,  Tn) 

1  +  5  <  P(x,Tn+ 1) 


<  1  +  8  , 


(20.15) 
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with  0  <  <5  <$C  1 .  Assuming  a  Boltzmann  type  distribution  for  P(x ,  Tn ),  we  obtain 


or 


exp 


<  1  +  8  , 


(20.16) 


Tn+ 1  > 


T 


n 


1  + 


U(x) 


ln(l  +  5) 


(20.17) 


Hence,  we  can  choose 


Tn-\- 1  ^ 


1  + 


3  ^  var(M)r/i 


ln(l  +  5) 


(20.18) 


where  we  replaced  H(x)  ^  3  ^var  (M)r  .  This  choice  is  plausible  if  one  recognizes 
that  we  can  replace  H(x)  ->  H(x)  —  Mmin  in  the  above  calculations,  where  HImin 
represents  the  (unknown)  minimum  of  W(x).  This  cooling  schedule  is  known  as  the 
Aarts  schedule. 

Finally,  we  have  to  discuss  how  to  terminate  the  algorithm.  Typically,  there  are 
several  choices  and  we  present  briefly  the  most  popular  ones.  The  obvious  choice  is 
to  terminate  the  algorithm  as  soon  as  the  acceptance  ratio  is  below  some  predefined 
threshold  value.  A  more  sophisticated  choice  is  to  terminate  the  algorithm  whenever 
the  mean  value  (H)  reaches  some  constant  value.  A  quite  different  and  more  formal 
approach  would  be  to  initially  define  a  maximum  number  of  iterations  or  to  set 
the  final  temperature  Tf  to  some  reasonable  value.  Nevertheless,  the  termination 
condition  has  to  be  defined  for  each  particular  problem  individually. 

Before  presenting  an  example,  we  note  some  further  results  associated  with 
cooling  strategies.  It  was  demonstrated  by  S.  Kirkpatrik  et  al.  [15]  that  the  optimal 
cooling  strategy  for  a  BOLTZMANN  type  distribution  is  of  the  form 

Tn  ~r~~T  >  (20.19) 

In  (n) 

where  n  labels  the  temperature  steps.  In  this  case  the  global  minimum  is  found 
with  probability  one.  However,  the  convergence  is  rather  slow.  In  addition,  several 
extensions  of  classical  simulated  annealing  have  been  suggested  in  the  literature. 
For  instance,  fast  simulated  annealing  uses  a  CAUCHY  distribution 

P(x,  T)  =  - J—7TT  (20.20) 

(x 2  +  r2)  2 

instead  of  a  Boltzmann  distribution.  Here  d  is  the  dimension  of  the  search  space 
S.  The  optimal  cooling  strategy  for  such  a  distribution  function  is  of  the  form 

1 

Tn  OC  -  , 

n 


(20.21) 
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Fig.  20.1  (a)  Initial  route  of  the  traveling  salesperson  for  36  cities  on  a  regular  grid,  (b)  One  of 
many  optimal  routes  of  the  traveling  salesperson  for  36  cities  on  a  regular  grid 


which  signifies  a  considerable  increase  in  convergence  speed  in  comparison  to 
Eq.  (20.19).  Another  generalization  is  referred  to  as  generalized  simulated  anneal¬ 
ing  and  is  based  on  the  Ts ALLIS  distribution  which  depends  on  an  external 
parameter  e: 


Pe(x,T)  =  i 

It  can  be  demonstrated  that  P€  converges  toward  the  Boltzmann  distribution  for 
e  ->  0.  We  mention  in  passing  that  the  concept  of  the  TSALLIS  distribution  is 
closely  intertwined  with  the  definition  of  the  Ts  ALLIS  entropy  and  the  formulation 
of  non-extensive  thermodynamics  by  C.  Ts  ALLIS  [16]. 

As  a  first  illustrative  example  we  discuss  the  traveling  salesperson  problem  for 
N  —  36  cities  on  a  regular  grid  because  in  this  case  the  optimal  route  is  easily 
identified.  We  calculate  the  initial  temperature  from  Eq.  (20.13)  and  employ  the 
geometric  cooling  schedule  (20.14)  with  q  —  0.99  together  with  a  termination 
criterion  of  the  form 


1  + 


eU(x) 

kBT 


(20.22) 


Wr.  -  P)r„_,  <  V  .  (20.23) 

where  rj  is  the  required  accuracy.  Figure  20.1a  presents  one  route  for  the  initial 
temperature  and  Fig.  20.1b  displays  one  of  many  optimal  routes  after  convergence 
has  been  reached.  This  case  will  be  called  the  first  scenario .  In  the  second  scenario 
we  place  36  cities  in  four  equally  spaced  clusters.  Results  for  the  optimal  route  are 
presented  in  Fig.  20.2b. 

The  possibility  of  phase  transitions  to  occur  during  the  cooling  process  has 
already  been  mentioned.  In  a  genuine  physical  system  the  question  whether  a  phase 
transition  is  possible  at  all  or  if  it  is  of  first  or  second  order  is  solely  determined  by 
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Fig.  20.2  (a)  Initial  route  of  the  traveling  salesperson  for  36  cities  placed  in  four  equally  spaced 
clusters,  (b)  One  of  many  optimal  routes  of  the  traveling  salesperson  for  36  cities  placed  in  four 
equally  spaced  clusters 


T  T 


Fig.  20.3  (a)  The  expectation  value  (M)r  and  (b)  the  ‘specific  heat’  c/7  vs  temperature  T  for 
scenario  one 


the  Hamilton  function  H(x)  of  the  system.  As  an  intriguing  example  we  refer  to 
the  ^-states  Potts  model  of  Sect.  18.3  where  a  second  order  phase  transition  was 
observed  for  q  <  4  and  a  first  order  phase  transition  for  q  >  4.  In  analogy,  the  order 
of  a  ‘phase  transition’  during  the  iteration  process  toward  the  global  minimum  in 
simulated  annealing  is  completely  determined  by  the  particular  properties  of  the  cost 
function  M(x).  We  want  to  study  such  a  possibility  and  determine  the  expectation 
values  (M)r  and  the  ‘specific  heat’  Ch  as  functions  of  temperature  T  for  the  two 
scenarios  of  the  traveling  salesperson  problem.  Figure  20.3  presents  the  results  for 
scenario  one  and  Fig.  20.4  those  for  scenario  two.  The  second  scenario  develops 
two  second  order  phase  transitions  while  in  the  first  scenario  only  one  second  order 
phase  transition  can  be  observed.  The  first  phase  transition  of  the  second  scenario 
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Fig.  20.4  The  same  as 
Fig.  20.3  but  for  scenario  two. 
Two  second  order  phase 
transitions  are  observed.  They 
are  indicated  by  down  arrows 
labeled  ( 1 )  and  (2) 


(at  T  m  3.5)  can  be  related  to  the  optimization  of  the  clusters’  sequence  while  in 
the  second  phase  transition  (at  T  %  0.42)  the  sequence  of  cities  within  the  clusters 
becomes  finalized.  These  two  transitions  are  indicated  by  down  arrows  labeled  (1) 
and  (2)  in  Fig.  20.4. 


20.4  Genetic  Algorithms 

The  sparkling  idea  of  genetic  algorithms  has  originally  been  lent  from  natures 
survival  of  the  fittest  [17].  The  basic  intentions  are  quickly  summarized  by  remem¬ 
bering  the  natural  evolution  of  a  particular  species  within  a  hostile  environment:  The 
individuals  of  the  species  reproduce  from  one  generation  to  another.  During  this 
process  the  genes  of  the  individuals  are  modified  by  local  mutations.  Individuals 
best  accustomed  to  the  environment  then  survive  with  higher  probability.  This  very 
last  process  is  referred  to  as  selection.  By  iterating  this  process  for  large  populations 
the  individuals  of  the  whole  species  will  adjust  their  properties  to  the  environment 
on  average ,l  and,  thus,  the  individuals  will  be  better  equipped  for  survival  within 
the  hostile  environment.  A  large  population  is  compulsory  in  order  to  obtain  a  huge 
variety  in  the  phenotype  of  the  individuals.  Algorithms  based  on  such  a  scheme  are 
referred  to  as  genetic  algorithms. 

We  are  not  going  into  the  details  of  the  implementation  of  genetic  algorithms 
because  this  is  beyond  the  scope  of  this  book.  However,  the  ideas  sketched  above 
will  be  applied  to  the  problem  of  the  traveling  salesperson  passing  through  m-cities 
just  to  illustrate  the  method.  Let  s  =  (s i , . . . ,  sm)  e  Nm  denote  a  list  of  m  integers, 
which  obey  Si  <  i.  For  instance,  for  m  —  10,  s  might  be  given  by 


'Note  that  in  the  real  world  the  environment  (in  particular  the  natural  enemies  of  a  species)  develop 
as  well.  Moreover,  we  do  not  consider  any  communication  within  a  species,  like  the  formation  of 
societies ,  learning ,  and  related  processes. 
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Table  20.1  Sample  tour  to  illustrate  the  recovery  of  the  order  of  cities  within  a  genetic  algorithm. 
Elements  indicated  by  [x]  are  ‘selected’  elements  which  are  added  to  the  column  Tour 


/V 

S 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Tour 

9 

1 

2 

3 

4 

5 

6 

7 

8 

[9] 

10 

-* 

9 

4 

1 

2 

3 

[4] 

5 

6 

7 

8 

10 

-* 

4 

3 

1 

2 

[3] 

5 

6 

7 

8 

10 

-* 

3 

3 

1 

2 

[5] 

6 

7 

8 

10 

-* 

5 

5 

1 

2 

6 

7 

[8] 

10 

-* 

8 

1 

[i] 

2 

6 

7 

10 

-* 

1 

4 

2 

6 

7 

[10] 

-* 

10 

2 

2 

[6] 

7 

-* 

6 

2 

2 

[7] 

-* 

7 

1 

[2] 

-* 

2 

s  =  (1,2, 2, 4, 1,5,  3,  3, 4,  9)  .  (20.24) 

The  order  of  cities  is  then  recovered  by  setting  s  —  (sm , . . .  ,s\)  and  performing  the 
steps  illustrated  in  Table  20.1. 

In  words:  The  vector  5  labels  the  elements  taken  from  the  list  (1, 2, . . . ,  m)  with 
removal.  The  resulting  list  Tour  specifies  the  optimum  sequence  of  the  cities.  The 
genetic  algorithm  is  executed  in  the  following  steps: 

•  Define  M  initial  individuals. 

•  Mutation :  for  each  individual  we  introduce  a  single  random  local  modification 
with  probability  pmut. 

•  Reproduction :  We  produce  M  additional  individuals  by  pairwise  combining  the 
parents .  This  is  performed  by 

(a)  Pick  two  individuals  at  random. 

(b)  Draw  a  random  integer  r  e  [1,  m  —  1]  and  replace  the  first  r  genes  of  the  first 
individual  by  the  first  r  genes  of  the  second  individual  and  vice  versa. 

In  this  way,  we  obtain  2 M  individuals. 

•  Selection :  The  M  individuals  with  the  highest  fitness  which  corresponds  to  the 
lowest  value  of  the  cost  function  survive. 

The  above  steps  are  repeated  until  the  desired  number  of  generations  has  been 
achieved. 

In  Fig.  20.5  we  show  the  optimal  path  for  the  traveling  salesperson  problem 
discussed  in  the  previous  section,  but  now  for  N  =  30  cities.  It  was  obtained  with 
the  genetic  algorithm  described  here.  The  number  of  individuals  was  chosen  to  be 
M  =  5000  and  the  number  of  generations  to  be  G  —  5000. 

Some  remarks  are  appropriate:  First  of  all  we  note  that  there  are  many  different 
permutations  of  how  a  genetic  algorithm  can  be  realized.  In  particular,  it  is  the 
problem  which  determines  the  most  convenient  form  to  implement  the  essential 
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Fig.  20.5  (a)  The  random  route  of  one  individual  out  of  the  population  of  5000.  (b)  One  of  many 
optimal  routes  of  the  traveling  salesperson  for  N  =  30  cities  as  obtained  by  a  genetic  algorithm 


ingredients:  mutation,  reproduction,  and  selection.  However,  particular  care  is 
required  in  formulating  the  algorithm  in  such  a  way  that  it  does  not  produce 
individuals  which  are  too  similar.  In  such  a  case  the  algorithm  is  very  likely  to 
terminate  in  a  local  minimum. 

Another  remark  comments  on  how  to  treat  optimization  problems  with  contin¬ 
uous  variables  v.  Here  it  might  be  advantageous  to  represent  the  variable  v  in  its 
binary  form  because  it  makes  the  reproduction  step  particularly  simple. 


20.5  Some  Further  Methods 

We  briefly  list  some  alternative  stochastic  optimization  techniques  without  going 
into  detail.  Two  famous  alternatives  which  are  closely  related  to  simulated  annealing 
are: 

•  Threshold  Accepting  Algorithms :  The  new  configuration  x'  is  accepted  with 
probability  one  if  W(x')  <  M(x)  +  T.  During  the  simulation  the  temperature  or 
threshold  level  T  is  continuously  decreased.  The  above  choice  of  an  acceptance 
probability  is  very  effective  to  allow  for  an  escape  from  local  minima. 

•  Deluge  Algorithms :  These  algorithms  are  very  similar  to  threshold  accepting 
algorithms.  We  present  it  in  the  original  formulation  which  is  suited  to  find 
the  global  maximum  of  a  function  G(v).  The  global  minimum  of  W(x)  can  be 
found  by  searching  the  maximum  of  G(v)  =  —  M(x).  One  accepts  a  new  state 
x'  with  probability  one  if  G(V)  >  T ,  where  T  is  continuously  increased  during 
the  simulation.  Hence,  the  whole  landscape  of  G(v)  is  flooded  with  increasing  T 
until  only  the  summits  of  G(v)  are  left.  Finally,  only  the  biggest  mountain  will 
reach  out  of  the  water  and  the  global  maximum  has  been  found. 
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Two  famous  ideas  which  are  closely  related  to  genetic  algorithms  are: 

•  Grouping  Genetic  Algorithms :  The  idea  is  to  put  the  individuals  of  the  population 
in  distinct  groups.  These  groups  may  for  instance  be  formed  by  comparing  the 
genes  or  grouping  individuals  with  similar  cost  function  values.  All  members 
of  a  group  have  one  part  of  the  genes  in  common,  and  all  operators  acting  on 
genes  act  on  the  whole  group.  Such  an  approach  can  significantly  improve  the 
convergence  rate  of  a  classical  genetic  algorithm. 

•  Ant  Colony  Optimization :  The  idea  is,  again,  borrowed  from  nature,  in  particular 
from  an  ant  colony  searching  the  optimum  path  between  two  or  more  fixed  or 
variable  points.  In  a  real  world  an  ant  travels  from  one  point  to  another  randomly, 
leaving  a  trail  of  pheromone  on  its  traveled  path.  Following  ants  are  very  likely 
to  follow  the  pheromone  trail,  however,  some  random  nature  remains.  The  key 
point  is  that  with  time  the  pheromone  trail  starts  to  evaporate,  hence  its  impact 
on  the  path  of  following  ants  is  reduced  if  the  path  is  not  traveled  frequently 
or  often  enough  so  that  the  pheromones  evaporated.  In  this  way  one  prevents  the 
algorithm  to  get  stuck  in  a  local  minimum  and  the  global  minimum  may  be  found 
by  sending  out  artificial  ants. 

There  are  many  further  methods  available  in  the  literature  (see,  for  instance, 

Refs.  [2,  18])  to  which  we  refer  the  interested  reader. 


Summary 

The  local  maximum/minimum  of  some  cost  function  H(jc)  within  a  search  space 
§  can  be  determined  using  stochastic  methods,  thus  establishing  a  particular  class 
of  algorithms  known  as  Stochastic  Optimization.  The  most  straightforward  method 
was  the  algorithm  of  hill  climbing  which  resembled  a  controlled  random  walk 
within  a  restricted  search  space  §  called  neighborhood.  Because  of  this  feature 
hill  climbing  will  find  in  general  local  minima  within  this  neighborhood  and  the 
global  minimum  has  to  be  found  under  variation  of  initial  conditions.  This  made 
this  method  too  expensive  for  more  complex  problems  from  a  computational  point 
of  view.  To  move  from  a  random  walk  formulation  to  a  formulation  on  the  basis 
of  MARKOV-chain  Monte  Carlo  was  the  logical  next  step.  The  method  of  choice 
was  named  simulated  annealing.  It  used  the  METROPOLIS-HASTINGS  algorithm  to 
generate  new  configurations  within  a  search  space  §  from  a  temperature  dependent 
equilibrium  distribution.  A  cooling  strategy  was  used  to  slowly  restrict  the  search 
space  to  the  neighborhood  of  the  global  minimum.  This  global  minimum  was  always 
found,  albeit  rather  slowly.  We  mentioned  some  flavors  of  this  basic  algorithm 
which  either  differed  in  the  definition  of  the  acceptance  probability  or  in  the  cooling 
strategy.  A  completely  different  class  of  algorithms  was  established  with  the  so- 
called  genetic  algorithms.  They  were  adapted  from  nature’s  concept  of  the  survival 
of  the  fittest.  They  were  based  on  the  notions  of:  (i)  Mutation,  a  single  random  local 
modification  of  a  certain  probability,  (ii)  Reproduction,  additional  ‘individuals’  were 
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generated  by  pairwise  combining  parents,  (iii)  Selection:  Individuals  with  the  lowest 
value  of  the  cost  function  survived  and  mutation  started  again.  Genetic  algorithms 
established  a  very  versatile  class  of  solvers  to  cover  a  huge  body  of  optimization 
problems. 


Problems 

Solve  the  traveling  salesperson  problem  for  N  =  20  cities  on  a  regular  grid  with 
the  help  of  simulated  annealing.  As  a  cooling  schedule,  use  the  geometric  cooling 
as  explained  in  Sect.  20.3.  Determine  the  initial  temperature  by  demanding  an 
acceptance  rate  of  90  %  and  terminate  the  algorithm  if  the  mean  value  of  the  cost 
function  (H)  remains  unchanged  for  at  least  10  successive  temperatures.  Calculate 
the  expectation  value  (H)r  for  different  temperatures  and  identify  the  transition 
temperature.  In  a  second  step  produce  a  list  of  20  cities  which  are  randomly 
distributed  on  a  two-dimensional  grid.  Optimize  this  problem  as  well.  Note  that 
you  should  produce  the  list  of  cities  only  once  in  order  to  obtain  comparable  and 
reproducible  results. 
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Appendix  A 

The  Two-Body  Problem 


Consider  two  mass  points  with  positions  r,-(f)  e  R3,  i  —  1,2  and  masses  m,, 
i  =  1, 2.  It  is  assumed  that  the  point  masses  interact  through  a  central  potential 
U  —  U(\r\(t)  —  r2(OI)  and  that  external  forces  are  neglected.  Thus,  the  system  is 
closed.  The  explicit  notation  of  time  t  is  now  omitted  for  the  sake  of  a  more  compact 
presentation.  Furthermore,  we  introduce  with  pt  e  M3,  i  =  1,2  the  point  mass’ 
momentum  and  the  LAGRANGE  function  [1-5]  of  the  system  takes  on  the  form 

2  2 

L(n,r2,pi,p2 )  =  2r~  +  -  u(\n  -r2 1)  .  (A.l) 

2m  i  2m2 

The  moments  pt  are  replaced  by 

Pi  —  Mf  r  i ,  i  —  1,2,  (A.  2) 

and  this  yields  for  the  LAGRANGE  function  (A.l) 

L(rx,r2,r\,h)  =  -  U(\r\  -  r2\)  ,  (A.3) 

where  g  denotes  the  time  derivative  of  r/.  We  note  the  following  symmetries: 
the  Lagrange  function  is  (i)  translational  invariant,  (ii)  rotational  invariant,  and 
(iii)  time  invariant.  We  know  from  classical  mechanics  that  each  symmetry  of 
the  Lagrange  function  corresponds  to  a  constant  of  motion  (a  quantity  that 
is  conserved  throughout  the  motion)  and,  thus,  results  in  a  reduction  of  the 
dimensionality  of  the  12-dimensional  phase  space. 

Let  us  demonstrate  these  symmetries:  In  order  to  prove  translational  invariance, 
we  transform  to  center  of  mass  coordinates  which  are  defined  as 

m  i  r\  +  m2r2 

R  =  -  and  r  —  r2  —  r\  .  (A.4) 

m\  +  m2 
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It  is  easily  verified  that  we  can  express  the  original  coordinates  r\  and  r2  with  the 
help  of  (A.4)  as 


r\  —  R  H - r  and  ri  —  R - r  .  (A. 5) 

m\  +  m2  m\  +  m2 

The  Lagrange  function  (A. 3)  is  rewritten  in  these  new  coordinates  (A.4)  and  this 
yields 


M  •  0  m  0 

Ur,R,r,R )  =  -R2  +  -r2  -  U(\r\) 

=  L(r,  r,  R )  ,  (A.6) 

where  we  introduced  the  total  mass  M  and  the  reduced  mass  m : 

m  1  m2 

M  —  m\  +  m2  and  m  =  -  .  (A. 7) 

mi  +  m2 

Obviously,  the  center  of  mass  coordinate  R  plays  in  Eq.  (A.6)  the  role  of  a  cyclic 
coordinate:  It  does  not  appear  explicitly  in  the  LAGRANGE  function.  This  means 
that  the  system  is  translational  invariant  and  we  can  deduce  from  Lagrange’s 
equations  that 


d  d 


d t  dR 


d 

dR 


and  the  center  of  mass  momentum  is  conserved.  Hence,  we  obtain  that 


3 

—rL  —  MR  —  const , 
3R 


(A. 8) 


(A. 9) 


with  the  solution 


R(t)=At  +  B,  (A.  10) 

where  A,  B  e  M3  are  constants  determined  by  the  initial  conditions  of  the  problem. 
As  a  result,  the  center  of  mass  moves  along  a  straight  line  with  constant  velocity. 
We  collect  all  results  and  reformulate  the  Lagrange  function  (A.6)  as 

Ur,  r)  =  yA2  +  ^r2-  U(\r\ ) 

=  L(r,  r)  +  const . 


(A. 11) 


A  The  Two-Body  Problem 


343 


Hence,  the  problem  was  reduced  to  a  one-body  problem  with  the  Lagrange 
function  L(r,  r).  In  what  follows  the  tilde  is  omitted  and  the  LAGRANGE  function 

Ur,  r)  =  —r2  -  U(\r\ )  ,  (A.  12) 

is  now  studied  instead  of  Eq.  (A.  11).  It  is  an  effective  one-body  LAGRANGE 
function. 

In  the  next  step  the  effect  of  rotational  invariance  is  investigated.  Equation  (A.  12) 
resembles  the  Lagrange  function  of  a  particle  of  mass  m  which  is  located  at 
position  r  and  moves  in  the  field  of  a  central  force  F  e  M3.  This  force  points  to  the 
center  of  the  coordinate  system  (or  points  from  the  center  of  the  coordinate  system 
to  the  particle).  This  situation  is  clearly  invariant  under  a  rotation  of  the  coordinate 
system  since  U  —  U(\r\ )  depends  only  on  the  modulus  of  r.  Consequently,  r  is 
parallel  to  F  for  all  t  >  0.  In  such  a  case  the  vector  of  angular  momentum  t  e  M3  is 
conserved,  since 


d 

— t  —  uM  —  r  xF  —  0  ,  ->  l  =  const  ,  (A.  13) 

d  t 

where  is  the  torque.  This  allows  us  to  arbitrarily  rotate  our  coordinate  system. 
We  take  advantage  if  this  property  and  rotate  it  in  such  a  way  that 

l  =  \l\ez,  (A.  14) 

where  ez  is  the  unit  vector  in  z-direction.  Moreover,  since  the  angular  momentum  t 
is  given  by 

l  —  mr  x  r  —  const ,  (A.  15) 


and  because  l \\ez  we  conclude  that  r_ Lez.  This  allows  us  to  set  z  —  0  which 
means  that  the  whole  motion  of  the  point  mass  can  be  described  in  the  v  —  y 
plane.  Rotational  invariance  led  us  to  the  conservation  of  angular  momentum  and 
this  made  the  reduction  from  a  three-dimensional  problem  to  a  two  dimensional 
problem  possible.  The  particular  form  (A.  12)  of  the  LAGRANGE  function  suggests 
the  introduction  of  polar  coordinates  (p,  <p): 

m 

Up,  P,  <P)  =  ( P 2  +  P2<P 2)  -  U(p).  (A.  16) 

We  solve  now  Lagrange’s  equations  (A. 6)  on  the  basis  of  Eq.  (A.  16):  The  first 
step  deals  with  the  differential  equation  for  the  radius  p 


d  d 
d t  dp 


L  —  mpcp 


2 


U(p)  , 


(A.  17) 
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thus 


mp  —  mpip 2  H - U{p)  —  0  . 

dp 


(A.  18) 


The  differential  equation  for  the  angle  ip  follows  from 


d  9  d  7  .  9 

—  — L  =  — mp  ip  —  ——L  —  0  , 

d t  dip  d t  dip 


(A.  19) 


which  corresponds  to 


U  ,  ?  .V 

—  (mp  (p)  =  0  . 


(A. 20) 


Equation  (A.20)  is  trivially  fulfilled  since  according  to  Eq.  (A.  15) 


r\ 

mp  ip  —  \l |  =  const . 


(A. 21) 


However,  we  solve  Eq.  (A.21)  for  ip 


.  =  J£J_ 

^  mp 2 


(A. 22) 


plug  (A. 22)  into  (A.  18),  and  obtain 


-  l£l2  ,  d  ju  ^  n 

mpJ  dp 


(A. 23) 


We  make  use  of  the  time  invariance  of  the  Lagrange  function  (A.  16).  This 
equation  does  not  explicitly  depend  on  time  t  and  we  have 


d 

—L  —  0 

dt 


(A.24) 


This  implies  conservation  of  energy,  as  can  easily  be  demonstrated.  For  this  purpose, 
we  regard  the  total  time  derivative  of  the  Lagrange  function  L 

d  .9  ..9  ..9  9 

—L  =  p— L  +  p— L  +  ti—L  +  —L  , 
dt  dp  dp  dip  dt 


and  solve  for  jfL 


(A. 25) 


A  The  Two-Body  Problem 


345 


Consequently 


p—L  +  (p—L  —  L  —  const ,  (A.27) 

Op  0(f) 

which  states  the  conservation  of  energy.  We  evaluate  this  expression  with  the  help 
of  Eq.  (A. 16).  We  obtain 


.9  .9 

PtttL  +  (p—L  —  L 

dp  0(f) 


m 


—  (p2  +  p2<p2)  +  U  (p) 


m 


~  2P“  + 
=  E  . 


Kl: 


2  mp: 


+  U(p) 


(A. 28) 


Here  we  employed,  in  the  second  step,  relation  (A. 22).  In  summary,  time  invariance 
resulted  in: 


y  P2  +  yK  +  U(P)  =  E  ■  (A. 29) 

2  2  mpz 

This  is  a  first  order  differential  equation  in  p. 

The  necessary  step  required  for  a  solution  of  the  two-body  problem  can  now  be 
outlined:  (i)  Calculate  R(t )  according  to  Eq.  (A.  10),  (ii)  solve  Eq.  (A. 29)  in  order  to 
obtain  p(t),  (iii)  plug  p(t)  into  Eq.  (A. 22)  and  solve  for  (iv)  since  z(t)  —  0,  the 
original  vectors  r2(t)  can  be  constructed  from  p(t)  and  (p(t).  All  integration 
constants  are  uniquely  determined  by  the  initial  conditions  of  the  problem  at  hand. 
From  Eq.  (A. 29)  we  obtain 


E  -  U(p)  - 


(A. 30) 


which  results  in  an  implicit  equation  for  p 

mp' 

y/2mp,2[E-  U(p')]-\e\2  ’ 


(A.31) 


where  we  defined  po  =  p(to),  is  some  initial  time,  and  we  neglected  the  negative 
root.  Equation  (A.31)  defines  t  as  a  function  of  p,  t  —  t(p ),  which  has  to  be  inverted 
to,  finally,  obtain  the  required  solution  p  =  p(t).  Whether  Eq.  (A.31)  can  be  solved 
analytically  depends  on  the  particular  form  of  the  potential  U(p).  If  Eq.  (A.31) 
cannot  be  solved  analytically  one  has  to  employ  numerical  approximations. 
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Finally,  the  angle  ip  can  be  expressed  as  a  function  of  the  radius  p,  i.e.  ip  —  <p(p). 
We  get  from  Eqs.  (A. 22)  and  (A. 30) 


dip 

dp 


&<p  At  \t\  \2 
—  —  =  ± — -  —  \  E  —  U(p) 
dt  dp  mpz  \_m 


integrate  over  p ,  and  find  the  desired  relation 


P  =  <po  ±  \l\ 


f 

Jpo 


dp' 


p'^Jlmp’1  [E-U(p')\  -  \t\ 2  ' 


(A. 32) 


(A. 33) 


where  <^o  =  <p(to). 


Appendix  B 

Solving  Non-linear  Equations:  The  Newton 
Method 


We  give  a  brief  introduction  into  the  solution  of  non-linear  equations  with  the  help 
of  Newton’s  method.  We  regard  a  differentiable  function  F(x)  and  we  would  like 
to  find  the  solution  of  the  equation 


F(*)  =  0.  (B.l) 

The  simplest  approach  is  to  transform  the  equation  into  an  equation  of  the  form 

X=f(x),  (B.2) 

which  is  always  possible.  This  equation  could  be  solved  iteratively  by  simply 
repeating 


x,+ 1  =f(xt)  ,  (B.3) 

where  we  start  with  some  initial  value  xo.  If  this  method  converges,  one  can 
approximate  the  solution  arbitrarily  close,  however,  convergence  is  not  guaranteed 
and  will  in  fact  depend  on  the  transformation  from  Eqs.  (B.l)  to  (B.2).  A  more 
advanced  technique  is  the  so  called  Newton  method  [6,  7].  It  is  based  on  the 
definition  of  f(x)  as 


fix) 


which  allows  the  iteration 


*t+\ 


Fix,) 
F'(x, )  ‘ 


(B.4) 


(B.5) 


^his  method  is  also  referred  to  as  the  Newton-Raphson  method. 
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Here  F\x)  denotes  the  derivative  of  F(x)  with  respect  to  v.  The  convergence 
behavior  of  the  iteration  (B.5)  highly  depends  on  the  form  of  the  function  F(x) 
and  on  the  choice  of  the  starting  point  xq.  The  routine  can  be  regarded  as  converged 
if  \xt+\  —  xt\  <  €,  where  e  is  the  accuracy  required. 

If  F(x)  is  not  differentiable  one  can  use  the  regula  falsi  or  employ  stochastic 
methods  which  are  discussed  in  the  second  part  of  this  book.  The  iteration  of  the 
method  known  as  regula  falsi  is  [6,  7] 


Xt+\  =Xt-  F(xt ) 


Xt  -  Xt-1 


F(xt)  -  F(xt- 1) 


(B.6) 


A  more  detailed  discussion  on  methods  to  solve  transcendental  equations  numer¬ 
ically  can  be  found  in  any  textbook  on  numerical  methods,  see  for  instance 
Refs.  [8,  9].  We  shall  also  briefly  introduce  the  case  of  a  non-linear  system  of 
equations  of  the  form  (B.l)  where  F(x)  e  RN  and  v  e  RN .  In  this  case  the  iteration 
scheme  is  given  by 


x,+ 1  =  x,  -J  '(x,)F(x,)  , 


where 


j(x)  =  yxF{x)  = 


9F 1  ( AT ) 

3fi  (x) 

axi  dxo 

3F2(x )  3F2(x) 

dxi 

dx2 

3  FN(x) 

3  Fn(x) 

3xi 

dx2 

3fi(x)  \ 
dxN 
dF2(x) 

8xn 

dFN(x) 
dxf\/  / 


(B.7) 


(B.8) 


is  the  Jacobi  matrix  of  F(x).  We  can  also  make  use  of  the  methods  discussed  in 
Chap.  2  to  calculate  numerically  the  derivatives  in  Eqs.  (B.5)  or  (B.8). 


Appendix  C 

Numerical  Solution  of  Linear  Systems 
of  Equations 


We  discuss  briefly  two  of  the  most  important  methods  to  solve  non-homogeneous 
systems  of  linear  equations  applying  numerical  methods.  We  consider  a  system  of  n 
equations  of  the  form 


d\\X\  +  ^12-^2  +  .  .  .  +  d\nXn  —  b\  , 
d2\X\  +  CI22X2  +  .  .  .  +  Cl2nXn  —  ^2  > 


dn\X\  T  an2x2  +  •  •  •  +  ClnnXn  =  K ,  (C.i) 

which  is  usually  transformed  into  a  matrix  equation, 

Ax  —  b  .  (C.2) 

The  coefficients  of  the  matrix  A  —  }  as  well  as  the  vector  b  —  {bi}  are  assumed 

to  be  real  valued  and,  furthermore,  if 

n 

2>|#0,  (C.3) 

i=  1 

the  problem  (C.2)  is  referred  to  as  non-homogeneous  (inhomogeneous).  The 
solution  of  non-homogeneous  linear  systems  of  equations  is  one  of  the  central 
problems  in  numerical  analysis,  since  numerous  numerical  methods,  such  as  the 
finite  difference  approach  to  a  boundary  value  problem,  see  Chap.  8,  can  be  reduced 
to  such  a  problem. 
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The  solution  of  (C.2)  is  well  defined  as  long  as  the  matrix  A  is  non-singular,  i.e. 
as  long  as 


det(A)  ^  0  .  (C.4) 

Then  the  unique  solution  of  (C.2)  can  be  written  as 

x  =  A~lb.  (C.5) 

However,  the  inversion  of  matrix  A  is  very  complex  for  n  >  4  and  one  would  prefer 
methods  which  are  computationally  more  effective.  Basically,  one  distinguishes 
between  direct  and  iterative  methods.  Since  a  complete  discussion  of  this  huge  topic 
would  be  too  extensive,  we  will  mainly  focus  on  two  methods. 

In  contrast  to  iterative  procedures,  direct  procedures  do  not  contain  any  method¬ 
ological  errors  and  can,  therefore,  be  regarded  as  exact.  However,  these  methods 
are  often  computationally  very  extensive  and  rounding  errors  are  in  many  cases 
not  negligible.  As  an  example  we  will  discuss  the  LU  decomposition.  On  the  other 
hand,  many  iterative  methods  are  fast  and  rounding  errors  can  be  controlled  easily. 
However,  it  is  not  guaranteed  that  an  iterative  procedure  converges,  even  in  cases 
where  the  system  of  equations  is  known  to  have  unique  solutions.  Moreover,  the 
result  is  an  approximate  solution.  As  an  illustration  for  an  iterative  procedure  we 
will  discuss  the  Gauss -Seidel  method. 


C.l  The  LU  Decomposition 

The  LU  decomposition  [6,  10]  is  essentially  a  numerical  realization  of  GAUSSIAN 
elimination  which  is  based  on  a  fundamental  property  of  linear  systems  of  equa¬ 
tions  (C.2).  This  property  states  the  system  (C.2)  to  remain  unchanged  when  a  linear 
combination  of  rows  is  added  to  one  particular  row.  This  property  is  then  employed 
in  order  to  obtain  a  matrix  in  triangular  form.  It  was  demonstrated  by  DOOLITTLE 
and  Crout  [6,  10,  11]  that  the  Gaussian  elimination  can  be  formulated  as  a 
decomposition  of  the  matrix  A  into  two  matrices  L  and  U : 

A  —  LU  .  (C.6) 

Here,  U  is  an  upper  triangular  matrix  and  L  is  a  lower  triangular  matrix.  In 
particular,  U  is  of  the  form 


/  U\\  U\2  .  .  .  U\n  ^ 

0  U22  .  .  .  U2n 


(C.l) 


\  0  0  ...  Unn  / 
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and  L  is  of  the  form 


1  0 
W21  1 


m3 2 


0 

1 


...  0\ 
...  0 
...  0 


Wlnl  Wln2 


(C.8) 


The  factorization  (C.6)  is  referred  to  as  LU  decomposition .  The  corresponding 
procedure  can  be  easily  identified  by  equating  the  elements  in  (C.6).  One  can 
show  that  the  following  operations  yield  the  desired  result:  For  j  —  1,2 , ,n  one 
computes 


i—  1 


a 


v 


-E 

k=  1 


m,'^  Ukj 


i  —  j  +  1 ,  j  +  2 , . . . ,  n  , 


(C.9) 

(C.10) 


with  the  requirement  that  ujj  ^  0.  Note  that  in  this  notation  we  used  the  convention 
that  the  contribution  of  the  sum  is  equal  to  zero  if  the  upper  boundary  is  less  than  the 
lower  boundary.  We  rewrite  Eq.  (C.2)  with  the  help  of  the  LU  decomposition  (C.6) 


Ax  —  LUx  =  b  , 


(C.ll) 


and  by  defining  y  —  Ux ,  we  retrieve  a  system  of  equations  for  the  variable  y: 

Ly  =  b.  (C.12) 

The  particular  form  of  L  allows  to  solve  the  system  (C.12)  immediately  by  forward 
substitution .  We  find  the  solution 


i—  1 

yi  =  bj-)  mikyk  ,  i=l,2,...,n,  (C.13) 

k=l 


and  the  equation 


Ux  —  y  , 

remains.  It  can  solved  by  backward  substitution : 


V;  = 


(C.14) 


I't'ik^k 


i  —  n,n  —  . 


(C.15) 
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We  note  that  this  method  can  also  be  employed  to  invert  the  matrix  A.  The 
strategy  is  based  on  the  relation 


AX  =  I ,  (C.16) 

where  X  =  A-1  is  to  be  determined  and  I  is  the  ^-dimensional  identity.  Equa¬ 
tion  (C.16)  is  equivalent  to  the  following  system  of  equations: 


Ax2 


9 


(C.17) 


where  the  vectors  jq  are  the  rows  of  the  unknown  matrix  X,  i.e.  X  =  (x\,x2, ,  xn). 
The  n  equations  of  the  system  (C.17)  can  be  solved  with  the  help  of  the  LU 
decomposition. 

Furthermore,  one  can  easily  calculate  the  determinant  of  A  using  the  LU 
decomposition.  We  note  that 


det(A)  =  det (LU)  —  det(L)  det(f/)  =  det (U)  ,  (C.18) 


since  L  and  U  are  triangular  matrices,  the  determinants  are  equal  to  the  product  of 
the  diagonal  elements,  which  yields  det(L)  =  1.  Hence  we  have 

n 

det(A)  =  det(£7)  =  [~[  w,,  .  (C.19) 

i=  1 

In  conclusion  we  remark  that  there  are  many  specialized  methods  which  have 
been  designed  particularly  for  matrices  of  specific  forms,  such  as  tridiagonal 
matrices,  symmetric  matrices,  block-matrices, _ Such  matrices  commonly  appear 
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in  physics  applications.  For  instance,  we  remember  that  the  matrix  we  encountered 
in  Sect.  8.2  within  the  context  of  a  finite  difference  approximation  of  boundary 
value  problems,  was  tridiagonal.  These  specialized  methods  are  usually  the  first 
choice  if  one  has  a  matrix  of  such  a  specific  form  because  they  are  much  faster  and 
more  stable  than  methods  developed  for  matrices  of  more  general  form.  Since  a  full 
treatment  of  these  methods  is  beyond  the  scope  of  this  book,  we  refer  the  interested 
reader  to  books  on  numerical  linear  algebra,  for  instance  Refs.  [10,  11]. 


C.2  The  Gauss-Seidel  Method 

The  Gauss-Seidel  method  is  an  iterative  procedure  to  approximate  the  solution 
of  non-homogeneous  systems  of  linear  equations  [6,  12].  The  advantage  of  an 
iterative  procedure,  in  contrast  to  a  direct  approach,  is  that  its  formulation  is  in 
general  much  simpler.  However,  one  might  have  problems  with  the  convergence 
of  the  method,  even  in  cases  where  a  solution  exists  and  is  unique.  We  note  that 
the  Gauss-Seidel  method  is  of  particular  interest  whenever  one  has  to  deal  with 
sparse  coefficient  matrices.  This  requirement  is  not  too  restrictive  since  most  of 
the  matrices  encountered  in  physical  applications  are  indeed  sparse.  As  an  example 
we  remember  the  matrices  arising  in  the  context  of  a  finite  difference  approach  to 
boundary  value  problems,  Sect.  8.2. 

Again,  we  use  Eq.  (C.l)  as  a  starting  point  for  our  discussion.  It  is  a  requirement 
of  the  Gauss-Seidel  method  that  all  diagonal  elements  of  A  are  non-zero.  We  then 
solve  each  row  of  (C.l)  for  v/.  This  creates  the  following  hierarchy 

1 

x\  = - (ai 2x2  +  a  13V3  +  . . .  +  ainxn  -/i)  , 

a\\ 


X2  — - (pi\X\  +  <223*3  +  •  •  •  +  ainXn  ~  fl)  » 

<222 


Xn  —  (<2/U*  1  T  an 2X2  +  •  •  •  +  dn,n—\Xn—\  fn )  > 

C^nn 


or  in  general  for  i  =  l, ...  ,n 


(C.20) 


1 

Xi  = - 

<2« 


(C.21) 


^matrix  A  is  referred  to  as  sparse,  when  the  matrix  is  populated  primarily  by  zeros. 
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We  note  that  Eq.  (C.21)  can  be  rewritten  as  a  matrix  equation 


x  —  Cx  -\-  b  , 


where  we  defined  the  matrix  C  =  {c#}  via 


c 


ij 


and  the  vector  b  =  {bi}  as 


an 


(C.22) 


(C.23) 


(C.24) 


We  recognize  that  Eq.  (C.21)  can  be  transformed  into  an  iterative  form  with  the  help 
of  a  trivial  manipulation 


Xi  =  Xi  - 


\ 


Xi  + 


a 


u 


n 


v5 


J\ 


(C.25) 


or 


4,+1)  =  xf  -  Ax®  , 


(C.26) 


where 


Ax®  = 


(t)  . 

+ 


a 


u 


(s>r +  it  w?  ~fi 

\J=l  J=i+ 1 


(C.21) 


Equation  (C.26)  in  combination  with  (C.21)  produces  a  sequence  of  vectors 


x(0)  XC)  x(2) 


(m) 


(C.28) 


where  is  referred  to  as  the  initialization  vector  or  trial  vector.  One  can  prove  that 
if  this  sequence  converges,  it  approaches  the  exact  solution  x  arbitrarily  close: 


lim  x®  —  x 

t->o o 


(C.29) 


We  remark  that  if  the  terms  v^+1)  on  the  right  hand  side  of  Eq.  (C.21)  are  replaced 
by  xf*  the  method  is  referred  to  as  the  Jacobi  method. 
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To  terminate  the  Gauss-Seidel  method,  we  need  an  exit  condition:  One  should 
terminate  the  iteration  whenever: 

•  The  approximate  solution  x®  obeys  the  required  accuracy  e  or  e,  for  instance 


max 


( 


* 


(0 


~xi 


(f-1) 


) 


<  € 


where  e  is  the  absolute  error,  or 


max 


(C.30) 


(C.31) 


where  €  is  the  relative  error. 

•  When  a  maximum  number  of  iterations  is  reached.  This  condition  may  be 
interpreted  as  an  emergency  exit  which  ensures  that  the  iteration  terminates  even 
if  the  process  is  not  convergent  or  has  still  not  converged. 

Let  us  discuss  one  final,  however,  crucial  point  of  this  section:  In  many  cases 
the  convergence  of  the  Gauss-Seidel  method  can  be  significantly  improved  by 
including  a  relaxation  parameter  co  to  the  iterative  process.  In  this  case  the  update 
routine  (C.26)  takes  on  the  form 


.(/+!) 

i 


—  coAx) 


it) 


(C.32) 


If  the  relaxation  parameter  co  obeys  co  >  1  one  speaks  of  over-relaxation ,  if  co  <  1 
of  under-relaxation  and  if  co  —  1  the  regular  Gauss-Seidel  method  is  recovered. 
An  appropriate  choice  of  the  relaxation  parameter  may  fasten  the  convergence  of 
the  method  significantly.  The  best  result  will  certainly  be  obtained  if  the  ideal  value 
of  co,  C0i  were  known.  Unfortunately,  it  is  impossible  to  determine  cot  prior  to  the 
iteration  in  the  general  case.  We  remark  the  following  properties: 

•  The  method  (C.32)  is  only  convergent  for  0  <  co  <  2. 

•  If  the  matrix  C  is  positive  definite  and  0  <  co  <  2,  the  Gauss-Seidel  method 
converges  for  any  choice  of  (Ostrowski-Reich  theorem,  [13]). 

•  In  many  cases,  1  <  C0i  <  2.  We  note  that  this  inequality  holds  only  under 
particular  restrictions  for  the  matrix  C  [see  Eq.  (C.23)].  However,  we  note  without 
going  into  detail,  that  these  restrictions  are  almost  always  fulfilled  when  one  is 
confronted  with  applications  in  physics. 

•  If  C  is  positive  definite  and  tridiagonal,  the  ideal  value  cot  can  be  calculated  using 


2 

1  +  Vl  -  A2  ’ 


(C.33) 


where  A  is  the  largest  eigenvalue  of  C,  Eq.  (C.23). 
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•  Since  the  calculation  of  A  is  in  many  cases  quite  complex,  one  could  employ  the 
following  idea:  It  is  possible  to  prove  that 


lim 


-oo 


|Ax(m)| 
|  Ax®  | 


(C.34) 


Hence,  one  may  start  with  co  =  1,  perform  to  (20  <  t o  <  100)  iterations  and  then 
approximate  &>/  with  the  help  of  Eq.  (C.33)  and 


A2  % 


|Ax(?o)| 

|  Ax^o-1) 


(C.35) 


The  iteration  is  then  continued  with  the  approximated  value  of  coi  until  conver¬ 
gence  is  reached. 

In  conclusion  we  remark  that  numerous  numerical  libraries  contain  sophisticated 
routines  to  solve  linear  systems  of  equations.  In  many  cases  it  is,  thus,  advisable  to 
rely  on  such  routines. 


Appendix  D 

Fast  Fourier  Transform 


Integral  transforms  are  indispensable  in  modern  mathematics  and  natural  science 
because  they  can  be  employed  to  simplify  complex  mathematical  problems.  In  this 
Appendix  we  will  discuss  the  FOURIER  transform  as  one  prominent  representative 
of  integral  transforms  in  general.  Loosely  speaking,  the  FOURIER  transform  is  the 
unambiguous  decomposition  of  a  function /(x)  into  its  frequency  components.  Its 
applications  range  from  the  harmonic  analysis  of  periodic  signals  to  the  solution  of 
differential  equations  and  the  description  of  wave  phenomena  in  classical  mechanics 
[2, 4],  electrodynamics  [14-16],  quantum  mechanics  [17-19],  and  many  more.  Here, 
we  briefly  discuss  its  numerical  implementation,  the  fast  Fourier  transform  (FFT) 
and  its  applications  in  Computational  Physics. 

We  start  by  recalling  the  concept  of  Fourier  series:  It  is  asserted  by  Fourier’s 
theorem  that  every  square-integrable,  d-periodic  function  /(x),/(x  +  d)  =  /(x),  can 
be  (uniquely)  represented  as1 


x — ^  ~  (  2jrnx  \ 

fix)  =  exp  ( i— —  1 


(D.l) 


/V 

where  the  complex  coefficients/^  e  C  are  related  to  /(x)  by  the  inverse  transform 


(D.2) 


'in  other  words,  the  plane  waves  exp(m27rx/<i)  with  period  d  form  a  complete,  orthonormal  basis 
in  the  space  of  d-periodic,  square  integrable  functions  with  the  scalar  product  (10.10).  We  remark 
that  this  also  applies  to  functions  which  are  defined  on  a  compact  interval  of  length  d  [20]. 
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The  representation  (D.l)  of  fix)  is  referred  to  as  the  Fourier  series  of  fix)  and 
the  coefficient  (D.2)  is  the  Fourier  coefficient  of  order  n.  Equation  (D.l)  is  an 
unambiguous  expansion  of  the  function  f(x)  into  contributions  which  oscillate  with 
an  integer  multiple  of  the  frequency  2n / d.  There  are  numerous  important  properties, 
examples  and  applications  of  Fourier  series  for  which  we  refer  to  the  literature 
[21-24]. 

The  concept  of  Fourier  series  can  be  generalized  to  the  idea  of  the  Fourier 
transform  of  a  square  integrable  function//*)  by  formally  letting  d  ->  oo  [23].  The 

^  A 

Fourier  transform  relates  the  function//*)  to  its  transform  f(k),  ieR,  via- 


dkf(k)  exp  (ikx), 


(D.3) 


and  the  inverse  transform  is  obtained  as 


f(k)  = 


2tt  /_ 


dxf(x)  exp  (—ikx). 


(D.4) 


The  transform  (D.3)  and  its  inverse  (D.4)  can  be  used  to  considerably  simplify 
mathematical  problems.  For  instance,  a  linear  differential  equation  for  the  function 

/V 

f{x)  is  mapped  onto  a  linear  algebraic  equation  for  f(k).  The  solution  of  the 

/V 

differential  equation  is  then  obtained  by  back- transforming  the  solution /(k)  of  the 
algebraic  equation.  Again,  we  refer  to  the  literature  for  further  applications  and  the 
various  properties  of  the  transforms  (D.3)  and  (D.4).  Instead,  let  us  concentrate  on 
the  question  of  how  to  compute  the  Fourier  transform  (D.2)  numerically. 

It  appears  to  be  reasonable  to  start  with  the  concepts  developed  in  Chap.  3.2  3 
For  this  purpose,  we  assume  that  the  function  f(x)  is  solely  known  on  a  grid  of  N 
equidistant  grid-points  xi,  l  =  0, . . . ,  N  —  1.  In  addition,  we  note  that  it  is  sufficient 
to  limit  our  discussion  to  27r-periodic  functions.  Thus,  we  can  choose  our  grid- 
points  to  be  xi  —  xo+lh  where  ,*o  =  Oand  h  —  2tt/N,  so  thatuv_i  =  2tt(1  — 1/A/). 

Approximating  the  integral  (D.2)  with  the  help  of  the  forward  rectangular  rule, 
Chap.  3,  yields 


1 

fn  =  ^  YJ1  CXP 
1=0 


+  0(h2)- 


(D.5) 


It  follows  from  this  equation  that  the  coefficients  fn  are  periodic  in  n  with  period 
N  due  to  the  finite  number  of  grid-points.  Hence,  the  maximal  number  of  distinct 


2We  work  here  with  the  asymmetric  definition  of  the  Fourier  transform.  For  other  definitions, 
the  pre-factors  have  to  be  adapted  consistently. 

3 If f(x)  is  not  periodic  we  have  to  truncate  the  integral  (D.4)  and  restrict  the  integration  to  a  suitable 
finite  interval  so  that  the  problem  again  reduces  to  the  evaluation  of  Eq.  (D.2). 
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coefficients  is  equal  to  the  number  of  grid-points.  The  inversion  of  Eq.  (D.5)  follows 
directly  from  Eq.  (D.l)  and  reads4 


fi  =  Y2-fn  eXP  (  —  j  •  (D.6) 

n= 0  ^ 

The  transforms  (D.5)  and  (D.6)  are  referred  to  as  the  discrete  Fourier  transform 
(DFT)  and  its  inverse,  respectively.  We  cast  these  relations  into  matrix  form  by 
defining  vectors  F  —  (/0, . . .  Jn-i)  and  F  —  (/o, . . .  Jn-i)  together  with  the 
matrix  W  of  elements: 


Wnm  =  a™.  (D.7) 

Here,  co^  —  exp  (^)  denotes  the  N- th  root  of  unity.  The  transformation  matrix 
W  is  known  as  the  FOURIER  matrix  or  DFT  matrix  and  it  is  easy  to  prove  that  its 
inverse  W~l  has  the  elements 


U-'L  =  <%"'"•  (D.8) 

All  this  allows  to  rewrite  Eqs.  (D.5)  and  (D.6)  in  compact  form: 

F=-W~lF,  and  F  =  WF.  (D.9) 

N 

Thus,  we  reduced  the  problem  of  numerically  implementing  the  FOURIER 
transform  (D.2)  to  the  task  of  multiplying  the  N  x  N  complex  matrix  W  with  the  N- 
element  vector  F.  This  means  that  we  have  to  perform  N 2  complex  multiplications 
and  N(N  —  1)  complex  additions.  However,  the  symmetry  Wnm  =  Wmn  already 
suggests  that  there  is  further  room  for  improvement.  In  fact,  there  are  methods  that 
do  much  better  and  these  algorithms  are  known  as  fast  FOURIER  transform  (FFT) 
algorithms. 

We  limit  our  presentation  to  the  version  proposed  by  Cooley  and  Tukey  [6, 25, 
26]  which  is,  with  some  variations,  the  most  common  algorithm.  In  its  simplest  form 
it  is  based  on  the  observation  that  one  can  always  split  the  Fourier  transform  (D.5) 
into  an  even  and  an  odd  part 


N/2 

fn  =  jj 

1=0 


CO 


Ini 

'N 


+ 


N 


N/2 

1=0 


(D.10) 


4It  follows  directly  from  the  summation  rule  of  the  geometric  series  that 


N—\ 

^exp 

m— 0 


2nmn^\ 

'—N~  ) 
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provided  that  N  is  even.  Since  caff  —  co}^2,  we  can  interpret  Eq.  (D.10)  as  the  linear 
combination  of  two  Fourier  transforms  of  length  N/2.  Denoting  the  Fourier 

/V 

coefficients  of  the  function  values  f2t  on  even  grid-points  by  An  and  of  the  values 
fu+i  on  odd  grid-points  by  Bn ,  we  obtain  for  n  —  l, ...  ,N/2 


fn  —  An  +  col^Bn.  (D.ll) 

We  now  make  use  of  the  property  that  the  Fourier  coefficients  are  periodic,  i.e. 

/v  w  /v  /v  n+N/2 

An+N/i  —  An,  Bn+N/2  —  Bn,  and  that  coN  —  — go Thus,  we  can  calculate  the 

A 

remaining  coefficients n  =  N/2  +  l, ...  ,N  with  the  help  of: 


fn+N/2  —  An  —  CO^Bn.  (D.12) 

Because  of  Eqs.  (D.l  1)  and  (D.12)  the  N  Fourier  coefficients  can  be  computed 
as  a  linear  combination  of  two  Fourier  transforms  of  size  N/2.  The  recursive 

A  /V 

application  of  the  very  same  scheme  to  An  and  Bn  constitutes  the  core  of  the  FFT 
algorithm  in  its  simplest  variation  [6] .  It  is  also  the  efficiency  of  this  algorithm  that 
makes  the  Fourier  transform  an  attractive  tool  for  numerical  calculations  [27].  In 
fact,  there  are  several  problems  in  this  book  where  an  algorithm  based  on  FFT  could 
have  been  evoked.  Fet  us  discuss  two  examples  in  more  detail  in  order  to  illustrate 
this. 

(i)  In  Chap.  9.3  we  could  have  solved  the  stationary  inhomogeneous  heat 
equation  for  its  FOURIER  coefficients  followed  by  the  back- transform.'  Denoting 

yv.  yv 

by  Tn  the  FOURIER  coefficients  of  the  temperature  T(x)  and  by  rn  the  FOURIER 
coefficients  of  the  heat  source/drain,  we  obtain  from  Eq.  (9.20) 


A 


[n 

{} nco )2  ’ 


(D.13) 


where  co  =  2i x/L.  Performing  the  inverse  FFT  on  Eq.  (D.13)  and  adding  the 
homogeneous  solution  (9.4)  immediately  gives  the  required  temperature  profile 
T(x).  In  a  similar  fashion,  FFT  could  have  been  used  for  solving  the  partial 
differential  equations  discussed  in  Chap.  11.  From  the  examples  in  this  chapter,  the 
time-dependent  SCHRODINGER  equation  serves  as  our  second  application. 

(ii)  The  Hamiltonian  of  a  free  point  particle  (for  simplicity  in  one  dimension)  is 
diagonal  in  momentum  space,  H  —  P2 /2m,  with  the  momentum  operator  (10.6). 
Given  the  position-space  representation  of  the  initial  state  \j/{x,t),  we  can  then 


5  This  is  not  a  limitation  because  we  can  always  choose  N  to  be  even. 

6 Here  we  use  the  fact  that  the  plane  waves  exp(in2jrx/d)  form  a  complete,  orthonormal  basis  of 
the  functions  defined  on  a  compact  interval  of  length  L  [20]. 
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compute  the  time  evolved  wave  packet  t  +  At)  for  arbitrary  At  >  0  according 
to  Eq.  (10.17)  as 

If  f  iAt  p 2  i  \  - 

t +  At)  =  ——  /  dp  exp  I — —  - ^  -px)\lf(p,t),  (D.14) 

2jin  J  V  n  2 m  n  ) 

A 

where  i/sip,  t)  is  the  momentum  space  representation  of  the  initial  state  \j/(x ,  t ),  i.e. 
its  Fourier  transform  with  k  —  p/fi.  Hence,  the  time  evolution  of  the  free  wave 
packet  is  readily  computed  numerically  with  the  help  of  the  FFT  and  its  inverse. 

It  is  now  certainly  interesting  to  investigate  whether  or  not  a  similar  approach 
can  be  applied  to  solve  Eq.  (10.1)  in  the  presence  of  a  potential  V(x)  which  is 
diagonal  in  position  space.  Although  this  can  be  achieved  by  solving  the  full 
stationary  eigenvalue  problem  (10.9)  followed  by  the  application  of  the  eigenvector 
expansion  (10.17),  we  present  here  a  more  efficient  but  approximate  solution  valid 
for  small  time  steps  At.  In  order  to  see  this,  we  transform  Eq.  (D.14)  into  a  slightly 
more  compact  form.  Denoting  by  &  the  FOURIER  transform  operator,  \j/( p )  = 
(x),  we  can  write  Eq.  (D.14)  as 


t  +  At)  =  lU At^^{x,t),  (D.15) 

where  UAt  —  exp  {—iAtp2 /2fim)  is  the  unitary  time  evolution  operator  for  the 
time  interval  At.  The  correct  result  can  not  be  obtained  by  multiplying  Eq.  (D.15) 
with  the  position-space  time  evolution  of  the  potential  V At  —  exp (—iAtV(x)/fi) 
because  the  operators  V  and  P  do  not  commute.  However,  by  applying  the  Baker- 
Campbell-Hausdorff  formula7 8  [17,  19,  28],  we  can  approximate  the  time 
evolution  t)  ->  \//(x,  t  +  At)  for  a  small  time  step  At  by: 

f(x,t  +  At)  =  VAt^~lUAt^f{xA)  +  At 2).  (D.16) 

An  even  better  approximation  is  obtained  by  the  symmetrized  form 

\j/(x,t  +  At)  —  Afi  1 U At/2'^‘V At^  1  UAt/2^^ (v,  t)  +  <^(Af3).  (D.17) 

This  method,  known  as  the  split  operator  technique  [29],  is  a  frequently  used 
method  to  numerically  solve  time  dependent  problems  in  quantum  mechanics  with 
the  help  of  FFT. 


7Ut  is  the  momentum  space  representation  of  the  free  unitary  time  evolution  operator  U  = 
exp  (— itP2 /2  hm) . 

8The  Baker-Campbell-Hausdorff  formula  states  how  the  exponential  function  exp(X  +  Y) 
of  two  non-commuting  operators  X  and  Y  can  be  expanded  in  terms  of  products  of  exponentials  of 
their  commutators  [17,  28]. 


Appendix  E 

Basics  of  Probability  Theory 


E.l  Classical  Definition 


It  is  the  aim  of  this  Appendix  to  summarize  the  most  important  definitions  and 
results  from  basic  probability  theory  as  required  within  this  book.  For  a  more  in 
depth  presentation  we  refer  to  the  literature  [30-34]. 

The  classical  probability  E(A)  for  an  event  A  is  defined  by  the  number  of 
favorable  results  n ,  divided  by  the  number  of  possible  results  m, 

P(A)  =  -  .  (E.l) 

m 

For  two  events  A  and  B  we  can  deduce  the  following  rules' 

P(A  V  B)  =  P(A)  +  P(B )  -  P(A  A  B)  ,  (E.2a) 


P(Z )  =  0  impossible  event;  Z  . . .  zero  element , 

P(I )  =  1  certain  event;  I . .. identity  element , 


0  <  P(A)  <  1  , 


P(A\B)  = 


P(A  A  B ) 

P(B ) 


9 


(E.2b) 

(E.2c) 

(E.2d) 


(E.2e) 


^ere  we  use  the  symbols  V  and  A  to  denote  the  Boolean  operators  OR  and  AND,  respectively. 
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where  P(A\B )  is  the  probability  for  the  event  A  under  the  constraint  that  event  B  is 
true.  Moreover,  if  A  is  the  complementary  event  to  A  we  have 

P(A)  =  1  -  P(A)  .  (E.3) 

The  statistical  definition  of  the  probability  for  an  event  A  is  given  by: 

P(A)  =  lim  -  .  (E.4) 

m—>o o  Hi 


E.2  Random  Variables  and  Moments 

A  random  variable  is  a  functional  which  assigns  to  an  event  co  a  real  number  v  from 
the  set  of  possible  outcomes  k2:  x  —  X(co)  [31].  Roughly  speaking  it  is  a  variable 
whose  value  is  assigned  to  the  observation  of  some  random  process.  The  mean  value 
of  a  discrete  random  variable  X  is  defined  by 

(X)  =  X(a>)P»  •  (E.5) 

( o€.£2 

where  PM  is  the  probability  for  the  event  co.  For  instance,  in  case  of  a  dice-throw 
X(co)  =  n  —  1, 2, . . . ,  6. 

We  restrict  ourselves  now  to  discrete  random  variables  and,  thus,  v  can  only 
take  on  discrete  values.  Furthermore,  we  introduce  the  function  of  random  variables 
Y  —  f{X)  and  define  quite  generally  its  mean  value: 

(. f(X ))  =  (/>  =  jy(xi)Pi  .  (E.6) 


Note  that 


i 

Moments  of  order  k  of  a  random  variable  X  are  defined  by 

mk  :=  (Xk)  , 


(E.7) 


(E.8) 


2This  means  that  A  V  A  =  I  and  A  A  A  =  Z. 

3  A  more  exact  formulation  will  follow  in  the  course  of  this  Appendix. 
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and  central  moments  are  introduced  via  the  relation 

lik  :=  ((AXf)  =  ((X  -  (X}f)  .  (E.9) 

Of  particular  interest  is  the  second  central  moment,  the  variance : 

var  (X)  :=  ((X  -  (X))2)  =  (X2)  -  (X)2  .  (E.10) 

Finally,  the  standard  deviation  o  is  defined  as  the  square  root  of  the  variance: 

a  :=  std(X)  =  yj var  (X)  .  (E.  11) 

We  study  now  a  discrete  set  of  observations  xi  where  i  =  1 , . . . ,  N.  Then  the  sample 
mean  value  is  given  by 

x  — 

and  the  error  (standard  deviation)  of  x 

var(T)  =  var 


~yXi, 


(E.12) 


(, standard  error)  can  be  determined  from: 


;  z>)  - 


O' 


N 


(E.13) 


We  assumed  here  the  xi  to  be  uncorrelated  with  the  consequence  that  cov  (v*  ,  xj)  — 
var  (xi)  Sij  [defined  in  Eq.  (E.16)].  Therefore, 

( 7 

standard  error  =  —  —=  ,  (E.14) 

Vn 

where  o  is  the  standard  deviation  of  the  observations  as  defined  above. 

In  the  case  of  multiple  random  variables  we  can  proceed  as  above.  For  instance, 
the  expectation  value  of  a  function  of  two  random  variables  is  given  by 


</(X,  Y))  :=  jy(xhyj)Pu, 


(E.15) 


IJ 


and  the  covariance  between  two  random  variables: 


cov  (X,  Y)  :=  ((X  -  (X))(Y  -  (Y)))  -  (XY)  -  (X)  (Y)  . 


(E.16) 
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Fig.  E.l  Uncorrelated  ( left  panel )  and  positively  correlated  ( right  panel )  variables  X  and  Y 


The  value  of  the  covariance  together  with  its  sign  determines  important  properties 
of  the  random  variables  X  and  Y  in  their  relation  to  each  other: 


cov  (X,  Y) 


>  0  for  Y  -  {Y)  >  0  X  —  (X)  >  0  , 

(positive  linear  correlation) 

<0  for  Y  -  {Y)  >  0  =*  X  —  (X)  <  0  , 

<0  for  X  -  (X)  >  0  Y  —  (F)  <  0  , 

(negative  linear  correlation) 

=  0  no  linear  dependence  between  X  and  Y  . 


(E.17) 


Random  variables  whose  covariance  is  zero  are  called  uncorrelated.  [This  property 
was  used  in  the  derivation  of  Eq.  (E.l 3).]  To  give  an  example,  Fig.  E.l  compares 
schematically  uncorrelated  and  positively  correlated  random  variables  X  and  Y. 


E.3  Binomial  Distribution  and  Limit  Theorems 


The  binomial  distribution  is  given  by 


n 


P(k\n,p)  =  |  "  |/(1  -p)"-k 


where  (")  is  the  binomial  coefficient 


n 


n\ 


(E.l  8) 


k 


k\(n  —  k)\ 


(E.19) 
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For  large  values  of  n  Stirling’s  approximation  can  be  applied  to  calculate  an 
estimate  of  n\: 


n\  =  nn+h~'lV2n  [l  +  .  (E.20) 

Furthermore,  it  is  easy  to  prove  that  the  mean  value  and  the  variance  of  the  binomial 
distribution  are  given  by 


{k)  =  np  , 

var(&)  =  np(  1  —  p)  . 

The  DE  Moivre-Laplace  theorem  states  that  for  var(&)  1 


P(k\n,p)  ^  g(k\ko,  cr )  = 


V  2jtg2 


exp 


(k  -  Icq) 
2  a2 


2l 


where  ko  =  (k)  and  a  =  ^/var  (k) .  We  can  also  deduce  that 


P(k  =  np\n,p)  — 


y/2jtnp{\  -p) 


->0, 


for  n  co.  From  this,  Bernoulli’s  law  of  large  numbers  follows 


P(\k/n  —  p |  <  e| n,p)  — ►  1  V  €  >  0  . 


(E.21) 

(E.22) 


(E.23) 


(E.24) 


(E.25) 


E.4  POISSON  Distribution  and  Counting  Experiments 


If  the  mean  expectation  value  /x  is  independent  of  the  number  of  experiments  n,  i.e. 
np  —  pi  =  const,  it  follows  from  Eq.  (E.18)  that 


k 

lim  P  (k  n,p  —  =  exp(— /x)^-  =  :  P(k\ji) 

«->oo  V  n  /  k\ 


(E.26) 


The  distribution  P{k\p)  is  called  POISSON  distribution.  We  obtain  for  the  POISSON 
distribution: 


(k)  =  /x  , 
var(&)  =  /x  . 


(E.27) 

(E.28) 
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It  is  important  to  note  that  counting  experiments,  as  for  instance  radioactive  decay, 
follow  the  Poisson  statistics.  A  typical  counting  experiment  observes  within  the 
time  interval  t  (in  average)  (i  events.  This  time  interval  is  now  divided  into  n  sub¬ 
intervals  with  At  —  t/n.  If  the  counting  events  can  be  assumed  to  be  independent, 
the  process  follows  a  binomial  distribution  and  we  have  fi  —  np.  This  is  equivalent 
to  p  =  pt/n.  We  return  to  the  case  of  radioactive  decay:  We  count  /x  signals  within 
one  minute  which  are  uniformly  distributed  over  the  time  interval.  The  experiment 
is  now  reduced  to  a  time  interval  of  one  second  and  the  probability  of  detecting  a 
signal  consequently  reduces  to  /x/60.  For p  1  but  np^l  the  binomial  distribution 
P(k\n,p )  can  be  approximated  by  P(k |/z)  and  we  can  use  for  large  values  of  pi 


P(k\n)  =  -  exp 

V27t<t2 


(k  ~  I1)2 ' 

2  o2 


(E.29) 


with 


a  —  ^/JL  ^  \[k  .  (E.30) 

In  most  experimentally  relevant  cases  is  pi  unknown  and  is  approximated  by: 

fi  =  k±Vk.  (E.31) 


E.5  Continuous  Variables 

We  define  the  cumulative  distribution  function  (cdf)  [31,  32],  F(x),  of  a  continuous 
variable  v  by4 


F(x)  :=  P(X  <  x\SS)  , 


(E.32) 


where  38  is  a  generalized  condition  ( condition  complex).  Moreover,  we  define  the 
probability  density  function  (pdf ),p(x)  by 


It  follows  that 


p(x) dx  —  [F(x  +  dx)  —  F(x)]  =  P(x  <X<x  +  dx\38)  . 


(E.33) 


(E.34) 


4For  convenience  we  use  here  the  notation  F(x)  for  the  cumulative  distribution  function  in  contrast 
to  the  notation  P(x)  used  throughout  the  second  part  of  this  book. 
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Hence, 


F(x)  =  f  ±x'p(xr)  .  (E.35) 

J—oo 

Note  that  the  pdf  is  normalized 

J  dx’p(x’)  =  F{ oo)  =  P(X  <  oo  |  ^)  =  1  ,  (E.36) 


and  non-negative 


p(x )  >  0  . 


(E.37) 


E.6  Bayes9  Theorem 

We  regard  a  set  of  discrete  events  At  under  the  generalized  condition  38.  Then  we 
have  the  normalization  condition 

YJP(Ai\@)  =  \,  (E.38) 


and  the  marginalization  rule 

P{B\@)  =  ^E(BlA^)m-l^)  •  (E.39) 

i 


Bayes’  theorem  [33,  35]  for  discrete  variables  follows  from  Eq.  (E.2e)  since  P(A  a 
B )  =  P(B  A  A): 


P{A\B,3S) 


P(B\A,38)P(A\38) 

P(B\A8) 


(E.40) 


In  case  of  continuous  variables  the  above  equations  modify  accordingly.  The 
marginalization  and  Bayes’  theorem  for  pdfs  are  then  given  by 


dxP(B \x,  38)p(x\A8)  , 


(E.41) 


and 


p(y  \x,3§) 


p(x\y,  £$)p(y\3§) 


p(x\38) 


(E.42) 
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E.7  Normal  Distribution 


The  normal  distribution  (GAUSS  distribution)  is  defined  by  the  pdf: 


p(x)  =  XK(x\x0  ,(T)  = 


1 


V  2tto2 


exp 


(x  -  Xp) 
2g2 


2l 


The  corresponding  cdf 


F(x) 


1  fx 

—  —=  I  dx  exp 

V  2  IT  O'  2  J—oo 

X  —  Xo 


=  0 


(^-EH 


(xf  -  Xp) 

2  a2 

X  —  Xo 

\J~2o2 


2l 


(E.43) 


(E.44) 


follows.  Here  @(x)  is  given  by 


<P(x)  =  -F=  [  dx'e~x'2/2  , 

•Jin  7-oo 


and  erf(x)  is  the  error  function  [36,  37]: 


erf(x)  = 


^  Jo 


f  dx\ 

Jo 


, — x 


n 


Furthermore,  we  obtain 


{x)  =  Xo  , 

r\ 

var(x)  =  a 


(E.45) 


(E.46) 


(E.47) 

(E.48) 


E.8  Central  Limit  Theorem 

Let  S  denote  a  random  variable  defined  by 

N 

S=J2  CiXi  ,  (E.49) 

i=  1 

where  the  Xt  are  independent  and  identically  distributed  random  numbers  with  mean 
l±  and  variance  o2  and 


lim 

A— >oo  N 


1  N 

-Yck  = 

AT  ' 


=  const 


i=  1 


Vk  e  Z  . 


(E.50) 
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Then, 


p(S\N,3S)  k  JY\S |  (S)  ,var(S)]  , 


(E.51) 


with 


N 

(S)=nJ2  Cl  ,  (E.52) 

i=  1 

and 

N 

var  (S)  =  a2  ^  c2  ,  (E.53) 

1=1 

for  large  values  of  N.  The  theorem  of  DE  Moivre-Laplace  is  a  special  case  of  the 
central  limit  theorem,  with  the  result  that  the  Xt  are  binomial  distributed. 


E.9  Characteristic  Function 

The  characteristic  function  G(k )  of  a  stochastic  variable  X  is  defined  by  [31,  32] 

G(k)  =  ( eikx )  =  J  dxeikxp(x )  ,  (E.54) 

where  I  denotes  the  range  of  the  pdf  p(x).  It  follows  that 

G( 0)  =  1  and  \G(k)\  <  1  .  (E.55) 

Expanding  Eq.  (E.54)  in  a  Taylor  series  with  respect  to  k  yields 

cm  =  E  A  / -  E  A  (y->  ■  «E-5® 

L — {  ml  Jr  L — 1  ml 

m  m 

Hence,  the  characteristic  function  is  a  moment  generating  function. 

E.10  The  Correlation  Coefficient 

We  shall  briefly  define  and  discuss  the  correlation  coefficient.  Two  random  variables 
X  and  Y  form  a  random  vector  Z  =  (X,  Y)  which  follows  the  pdf  p(Z )  =  p(X,Y) 
with  the  normalization 


dxd yp(x,y)  =  1  . 


(E.57) 
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The  correlation  coefficient  r  is  now  defined  as 

cov  (X,  Y ) 

^  var  (X)  var  (Y) 


(E.58) 


where  cov(X,  Y)  is  the  covariance  (E.16)  of  X  and  Y  while  var(-)  denotes  the 
variance  (E.10)  of  the  respective  argument.  It  follows  from  the  Cauchy-Schwarz 
inequality  that  0  <  r2  <  1  and,  therefore,  —  1  <  r  <  1. 

The  random  variables  X  and  Y  are  said  to  be  the  stronger  correlated  the  bigger 
r2  becomes  because  for  statistically  independent  (uncorrelated)  variables  we  have 
p(x,  y)  —  q\  (x)q2(y)  with  the  consequence  that  cov  ( X ,  Y)  =  0  and,  thus,  r  —  0. 

The  definition  of  the  correlation  coefficient  is  commonly  motivated  by  the 
problem  of  linear  regression:  Suppose  we  have  a  set  of  data  points  Y  associated 
with  data  points  X.  We  would  like  to  find  a  linear  function /(X)  =  a  +  bX  which 
approximates  the  data  points  Y  as  good  as  possible.  The  problem  may  be  stated  as 


([Y  -/(X)]2)  =  ((Y-a-  bX)2)  -*  min  ,  (E.60) 


where  a  and  b  are  real  constants.  This  corresponds  to  GAUSS’s  method  of  minimiz¬ 
ing  the  square  of  errors.  We  have 


d 


-2{Y-a-bX)  =  0  , 


(E.61) 


and 


d 

—  l(r -a-  bX f)  =  —2  ((Y  —  a  —  bX)X)  =  0 
ab 


Equations  (E.61)  and  (E.62)  result  in: 


a  +  b{X)  =  (Y)  , 
a  (X)  +  b  (X2)  =  (XF>  . 

Both  are  easily  solved  for  a  and  b  and  one  obtains 

a  —  (Y)  —  b  (X)  , 


(E.62) 


(E.63) 

(E.64) 


(E.65) 


5  One  defines  the  scalar  product  between  random  variables  (X,  Y)  =  cov  (X,  Y)  and  therefore 
||X||2  =  (X,  X)  =  var  (X).  The  Cauchy-Schwarz  inequality  reads 

|(X,  Y)|2  <||  X  ||2||  Y  ||2  ,  (E.59) 


and  therefore  0  <  r2  <  1 . 
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where 


(XY)  -  (. X )  (Y)  _  co v(X,Y) 
{ X2)  -  {X )2  ~~  var  (X) 


Thus,  the  linear  function  f{X)  which  approximates  the  data  points  Y  optimally  is 
given  by 


f(x )  =  (Y)  -  C0V  (X'  Y)  (X  -  (X)) 

var  ( X ) 


,  v  ,var(y),  , 

=  Y )-r,  — J-hX-  X), 
var  ( X ) 


(E.67) 


and  it  follows  immediately  for  the  squared  error: 

(Lv  -  F(x)]2)  =  var  <Y)  (1  -  r2)  .  (E.68) 

Hence,  the  best  result  is  achieved  for  r  —  ±  1  in  which  case  the  association  of  the 
data  points  Y  with  the  data  points  X  is  really  linear  while  the  worst  result  is  found 
when  r  —  0  (no  association  what  so  ever). 


E.ll  Stable  Distributions 

A  stable  distribution  is  a  distribution  which  reproduces  itself  [32].  In  particular, 
consider  two  random  variables  X\  and  X2  which  are  independent  copies  of  the 
random  variable  X  following  the  distribution  px 

The  pdf  px  is  referred  to  as  a  stable  distribution  if  for  arbitrary  constants  a  and 
b  the  random  variable  aX\  +  bX 2  has  the  same  distribution  as  the  random  variable 
cX  +  d  for  some  positive  c  and  some  d  £  R. 

For  this  case  one  can  write  down  the  characteristic  function  analytically.  We  will 
give  a  special  case,  the  so  called  symmetric  Levy  distributions  [38]: 

Ga(k)  —  exp (— o\k\a)  .  (E.69) 

Here  a  >  0  and  0  <  a  <  2.  The  pdf  of  such  a  distribution  shows  the  asymptotic 
behavior 


Pa  M  OC 


a 


x 


l+a  ’ 


OO 


(E.70) 


independent  copies  of  a  random  variable,  are  random  variables,  which  are  independent  and  follow 
the  same  distribution  as  the  original  random  variable. 
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The  normal  distribution  follows  from  Eq.  (E.69)  for  a  —  2.  Moreover,  we 
observe  from  Eq.  (E.70)  that  the  variance  diverges  for  all  a  <  2.  However, 
the  existence  of  the  variance  was  the  criterion  for  the  validity  of  the  central 
limit  theorem  formulated  in  Sect.  E. 8.  We  note  that  stable  distributions  reproduce 
themselves  and  are  attractors  for  sums  of  independent  identical  distributed  random 
variables.  This  is  referred  to  as  the  generalized  central  limit  theorem. 

We  remark,  in  conclusion,  that  for  a  =  1  the  CAUCHY  distribution  results  from 
Eq.  (E.69),  and  note  that  stable  distributions  are  also  referred  to  as  Levy  a-stable 
distributions  [32]. 


Appendix  F 

Phase  Transitions 


F.l  Some  Basics 


In  many  systems  transitions  between  different  phases  can  be  observed  if  an  external 
parameter,  such  as  the  temperature  or  the  particle  density,  changes.  Familiar 
examples  are  the  liquid- gaseous  phase  transition  or  the  ferromagnetic-paramagnetic 
transition.  The  two  phases  exhibit  different  properties  and  often  develop  a  different 
physical  structure,  like  in  disorder- to-order  transitions.  This  suggests  the  intro¬ 
duction  of  an  order  parameter  ip  which  is  zero  in  one  phase  and  takes  on  some 
finite  value  ip  ^  0  in  the  other  one.  For  instance,  in  the  case  of  paramagnetic- 
ferromagnetic  transitions  the  magnetization  plays  the  role  of  the  order  parameter 


[43]. 


In  order  to  classify  phase  transitions  we  briefly  repeat  some  basics  from  statistical 
mechanics  [39-42,  44,  45].  In  a  canonical  ensemble  the  probability  to  find  the 
system  in  micro-state  r  (as  a  function  of  the  external  parameters  temperature  T, 
volume  V  and  number  of  particles  N)  is  proportional  to  the  BOLTZMANN-factor 


(F.l) 


Here,  /3  =  l/(kBT),  where  kB  is  the  Boltzmann  constant,  T  is  the  temperature, 
and  Er  is  the  energy  of  micro-state  r.  The  canonical  partition  function  Z(T ,  V,  N), 


(F.2) 


r 


^he  choice  of  the  order  parameter  may  not  be  unique  [39-42]. 
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ensures  the  normalization  of  Pr(T,  V,  N )  and  determines  the  free  energy  F(T,  V,  N ) 
according  to 


F(T,  V,N)  =  ~  In Z(T,  V,N)  .  (F.3) 

P 

The  Ehrenfest  classification  [46]  of  phase  transitions  is  based  on  the  behavior 
of  F  near  the  transition  point:  If  F  is  a  continuous  function  of  its  variables  at 
the  transition  point  and  its  first  derivative  with  respect  to  some  thermodynamic 
variable  is  discontinuous  we  call  it  a  first  order  phase  transition.  For  instance, 
transitions  from  the  liquid  to  the  gaseous  phase  are  classified  as  first  order  phase 
transitions  because  the  density,  which  is  proportional  to  the  first  derivative  of  the 
free  energy  with  respect  to  the  chemical  potential,  changes  discontinuously  at  the 
boiling  temperature  T  —  TB.  We  remark  the  following  characteristics  of  first  order 
phase  transitions: 

1.  The  transition  involves  a  latent  heat  AQ:  The  system  absorbs  or  releases  energy. 

A  familiar  example  is  the  latent  heat  of  fusion  in  the  case  of  melting  or  freezing. 

2.  Both  phases  can  coexist  at  the  transition  point. 

3.  A  metastable  phase  can  be  observed. 

In  a  second  order  phase  transition  the  first  derivative  of  the  free  energy  F  with 
respect  to  some  thermodynamic  variable  is  continuous  but  the  second  derivative 
of  F  exhibits  a  discontinuity.  For  instance,  in  a  ferromagnetic  phase  transition  the 
magnetization  (first  derivative  of  F  with  respect  to  the  external  magnetic  field  B) 
changes  continuously  while  the  magnetic  susceptibility  /  (the  second  derivative  of 
F  with  respect  to  B )  is  discontinuous  at  the  CURIE  temperature  Tc  [43] . 

The  modem  classification  is  based  on  the  behavior  of  the  order  parameter  near 
the  critical  point.  The  order  parameter  changes  discontinuously  for  first  order 
phase  transitions  while  it  changes  continuously  for  second  and  higher  order  phase 
transitions.  Second  order  transitions  are  typically  related  to  spontaneous  symmetry 
breaking,  as  for  instance  in  the  paramagnetic-ferromagnetic  transition.  Based  on 
this  observation,  Landau  developed  a  general  description  of  second  order  phase 
transitions  which  we  briefly  discuss  in  the  following  section. 


F.2  Landau  Theory 

We  regard  a  second  order  phase  transition  characterized  by  the  scalar  order 
parameter  <p  [47].  Since  (<p)  changes  continuously  at  T  —  Tc,  it  is  convenient  to 
define  <p  in  such  a  way  that  (<p)  \t>tc  —  0  while  (cp)  \t<tc  0. 
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For  the  free  energy  F(T,  h,  <p),  one  chooses 


F(T,  h,  <p)  =  F0(T )  +  V 


a(T  —  Tc)  2  ,  b  4  , 

- 2 - V  +  ~FP  ~h(P 


(F.4) 


where  a  and  b  are  some  material  constants  and  h  denotes  the  external  field.  This 
ansatz  is  motivated  by  the  theory  of  the  paramagnetic-ferromagnetic  phase  transition 
[43].  Thus,  in  equilibrium  we  have 


8F 

8<p 


=  0, 


(F.5) 


which  results  in 


a(T  —  Tc)<p  +  bcp 3  =  h  . 


(F.6) 


For  h  —  0  and  T  <77  we  obtain 


a 


(<Po)  =  ,1  -^{Tc  -  T)  ~  {Tc  -  TY  , 


(F.7) 


where  y  =  1/2  is  called  the  critical  exponent .  For  T  >  Tc  we  have  (<^o)  =  0.  We 
now  regard  a  weak  external  field  h.  The  order  parameter  will  change 


W  =  {(po)  +  8cp  . 


(F.8) 


Again,  we  obtain  for  equilibrium: 


8F 

8cp 


=  a(T  -  Tc)(((po)  +  8<p)  +  b({<po)  +  -  h  —  0 


(F.9) 


Neglecting  contributions  of  order  &(8cp2)  yields  for  the  susceptibility 


9  ,  \  (^> 

x  =  Th{v)  =  — 


T  —  Tr 


(F.10) 


where  8  —  —1  is  a  second  critical  exponent.  This  is  the  Curie- Weiss  law  [43]. 
Finally  for  T  —  Tc  we  obtain  from  Eq.  (F.6) 


<P  =  7 


h 


(F.  11) 
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with  the  third  critical  exponent  e.  The  LANDAU  theory  is  a  mean-field  approxima¬ 
tion  since  local  fluctuations  of  the  order  parameter  are  neglected.  Although  the 
critical  exponents  obtained  with  Landau’s  approach  deviate  from  experimental 
values,  the  theory  is  qualitatively  correct.  We  remark  that  the  critical  exponents  are 
universal  (a  property  referred  to  as  universality  [40])  as  they  depend  only  on  the 
dimensionality  and  the  symmetry  of  the  interaction. 


2The  extension  to  space  dependent  order  parameters  is  referred  to  as  Ginzburg-Landau  theory 
[48]. 


Appendix  G 

Fractional  Integrals  and  Derivatives  in  ID 


This  section  introduces  briefly  the  common  definitions  and  notations  associated  with 
fractional  calculus  in  one  dimension  [49] . 

The  Riemann-Liouville  fractional  integrals  of  order  a  e  C  [9\(a)  >  0], 
and  /£_/(*)  on  a  finite  interval  [a,  b]  on  the  real  axis  R  are  given  by 

1  Cx  f(xf^) 

Ia+f(x)  ■=  Y~—  J  dx'  _x,y-a  for  iX  >  a ’  ^(“)  >  °)>  (G-!a) 

1  f(xf^) 

IbJ(x)  ■=  Y7~r  J  dx'  g  for  (x  <  b,  fR(a)  >  0)  ,  (G.lb) 


where  r(x)  denotes  the  Gamma  function  [36,  37],  93(af)  is  the  real  part  of  a,  and 
f(x )  is  a  sufficiently  well  behaved  continuous,  differentiable  function  for  which 
the  integrals  in  (G.l)  exist.  The  corresponding  Riemann-Liouville  fractional 
derivatives  D?_^f(x)  and  D%_f(x)  of  order  a  e  C  [93(of)  >  0]  are  defined  by 


o::+/(.i  := 


d)”  «;7)W 


1 


r{n  —  a)  \dx 


n  nx 

L* 


fix') 


_  v/ \a—nJr  1 


(v  —  x' ) 


for  x  >  a  ,  (G.2a) 


and 


Dah_fix)  ■=  (lTf)(x) 

=  1  (  d  V  f',  /  fix') 

r{n  —  a)  y  d x)Jx  {xf  —  x)a~n+x 


for  x  <  b  ,  (G.2b) 
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with  n  —  [£H(o')]  +  1.  Here  [93(a)]  denotes  the  integer  part  of  93(a).  For  a  —>  — oo 
and  b  — >  +oo  the  Riemann-Liouville  fractional  integrals  and  derivatives  are 
referred  to  as  Weyl  fractional  integrals  and  derivatives.  In  what  follows,  they  will 
be  denoted  by  I±  and  Da± ,  respectively. 

If  a  £  C  [91(a)  >  0]  and  [a,  b\  £  R  is  a  finite  interval,  then  the  left-  and  right¬ 
sided  CAPUTO  fractional  derivatives  cD®_hf(x)  and  cDb^f(x)  are  defined  by 

m  (a\ 

CDaa+f(x)  =  Daa+f(x)  -  r(l_  ^  1}  (x  ~  a)k~a  ,  (G.3a) 

k — 0 


and 


n—  1 

cD“_/(x)  =  DU'(x)  ~  J2 

k=() 


j-\)kf(k)(b) 
r(k-a  +  1) 


(b  -  x)k~a  , 


(G.3b) 


with 


_  ( [93(a)]  +  1 

( a 


a  £  N  , 
a  £  N0  . 


This  is,  however,  equivalent  to 


and 


C  r\a 


Daa+f(x)  = 


cDlJ{x)  = 


i  rdx,  AV) 

r(n  —  a)Ja  (x  —  x')a~n+1 

(!;;+" D"f)(x) , 


(-i r  [b  dr,  AV) 

r(n  —  a)  Jx  (xr  —  x)a~n+l 

(-l)n(InbZaDj)(x)  . 


(G.3c) 


(G.4a) 


(G.4b) 


The  symmetric  fractional  integrals  I? ,  and  derivatives  are  referred  to  as 

LT  I A I 

Riesz  fractional  integrals  or  derivatives  and  are  of  the  form 


/“  +FL 

2  cos  (^y) 


(G.5) 


fora  £  (0, 1)  and 

{(  nf  D\+D- 

^  '  2cos  {an/ 2) 


< 


Da+~D- 
2  sin  {an/ 2) 


for  n  =  [93(a)]  +  1  =  2k,  k  £  No  , 
for  n  —  [93(a)]  +  1  =  2&  +  1 ,  k  £  No  . 


(G.6) 


Appendix  H 

Least  Squares  Fit 


H.l  Motivation 

In  numerous  physics  applications  a  set  of  corresponding  data  points  {xk,yf)  was 
measured  or  calculated  and  a  set  of  certain  parameters  {afi  characterizing  a  function 
f(xk,  {aj})  is  to  be  determined  in  such  a  way  that 

X2  =  X! Ck  b*  {«/})]2  min  .  (H.l) 

k 

This  is  referred  to  as  a  least  squares  fit  problem  [6,  7].  Here,  Ck  >  0  are  weights, 
which  indicate  the  relevance  of  a  certain  data  point  (x^,  y^)  for  the  fitting  routine,  and 
f(x,  {otj})  is  referred  to  as  the  model  function.  Besides  numerous  applications  within 
the  context  of  experimentally  obtained  data  points,  we  already  came  across  such  a 
problem  in  our  discussion  of  data  analysis  in  Chap.  19.  Here  it  was  of  interest  to 
determine  the  experimental  auto -correlation  time  by  fitting  an  exponential  function 
to  the  measured  auto-correlation  coefficient  A  (k),  discussed  in  Sect.  19.3.  Hence,  we 
note  that  in  many  applications  the  parameters  {otj}  can  be  associated  with  a  physical 
property  of  interest. 

We  distinguish  between  two  different  cases:  (i)  the  function  f(xk,  {otj})  is  a  linear 
function  of  the  parameters  {afi  and  (ii)  the  function  f(xk,  {otj})  is  not  linear  in  its 
parameters  {otj}.  It  should  be  emphasized  that  in  both  cases  the  function  does  not 
need  to  be  linear  in  Xk.  This  section  will  discuss  methods  for  linear  as  well  as  non¬ 
linear  least  squares  fits.  However,  before  proceeding  some  comments  on  the  data 
points  {yk}  seem  to  be  required. 
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Suppose  the  points  (xk,  yk)  stem  from  a  measurement  which  has  been  repeated 
N- times.  In  this  case  for  every  value  Xk  we  have  N  different  values  {yJk}  and  we  may 
use  the  arithmetic  mean 


w  =  i>  (h-2) 

j 

instead  of  yk  in  expression  (H.l).  We  may  also  calculate  the  variance  var  (yk)  via1 

var  (yk)  =  2  ^(yj  -  %)2  .  (H.3) 

j 

If  we  assume  that  the  data  points  yJk  follow  a  normal  distribution  with  mean  (yk) 
and  variance  var  (yk)  we  may  proceed  in  the  following  way:  The  weights  Ck  are 
chosen  as 


Ck 


1 

var(y*)  ' 


(H.4) 


The  resulting  fit  parameters  {aj}  are  then  regarded  as  mean  values  of  parameters 
where  the  variances  var  (ai)  as  well  as  the  covariances  cov  (at,  otj)  can  be  obtained 
from  the  matrix 


1  92/2 

2  dctidoij 


via  inversion,  i.e. 


and 


Cjj  =  cov  (a i,  a.j)  . 


(H.5) 


(H.6) 


(H.7) 


The  matrix  C  is  commonly  referred  to  as  covariance  matrix. 


Tn  many  cases  one  employs  the  bias  corrected  variance  var  (yk)B  =  v ar  (y^).  For  a  detailed 

discussion  of  the  bias  corrected  variance  the  interested  reader  is  encouraged  to  consult  a  statistics 
textbook  [45,  50-53]. 
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H.2  Linear  Least  Squares  Fit 


In  this  particular  case  the  model  function/^,  {«/})  is  defined  as 


f(xk,{cij})  =  £>*,•(**)  , 


(H.8) 


where  (fj(xk)  are  linear  independent  basis  functions,  which  do  not  have  to  be  linear 
in  Xk.  The  particular  case  of  a  linear  regression ,  discussed  in  Sect.  E.10,  is  included. 
Equation  (H.8)  specifies  the  model  function  f(xk,  {oij})  in  (H.l)  and  this  yields 


x2  =  J2Ck 

k 


yk  -  £>**(*) 


(H.9) 


which  is  supposed  to  tend  to  a  minimum.  We  calculate 


da. 


=  -2)  ck<pi(xk) 


yk 


=  0, 


(H.10) 


and  arrive  at: 


E«/E  ck(pe(xk)(pj(xk )  =  ^2ckykn(xk) .  V£. 


(H.l  1) 


This  equation  can  be  reformulated  as  the  linear  equation 


Ma  —  /3  , 


(H.l  2) 


where  the  vectors  a  =  (ot\,  g?2,  . .  .)T  and  =  (>8i,  fc,  •  • . )T  with 

Pi  =  El  ckyk<Pi(xk)  , 

k 


and  the  matrix  M 


Mij  —  ^  ^  Ck(pi(xk)(pj{xk)  , 
k 


(H.13) 


(H.14) 


have  been  introduced. 
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Equation  (H.12)  can,  for  instance,  be  solved  with  the  help  of  the  methods 
discussed  in  Appendix  C.  It  is  also  particularly  simple  to  determine  the  covariances 
because  we  have 


1  d1 2/2 

2  dciidcij 


and  the  covariances  follow  from  Eqs.  (H.6)  and  (H.7). 


(H.15) 


H.3  Nonlinear  Least  Squares  Fit 

Before  we  discuss  the  most  general  case  of  a  completely  arbitrary  model  function 
f(xk,  {oij})  we  want  to  point  out  that  it  is  in  most  cases  of  advantage  to  linearize  the 
model  function  if  at  all  possible.  For  instance,  if  the  model  function  is  an  exponential 
function,  it  may  be  linearized  by  taking  the  data  points  ln(y^)  instead  of  y^. 

However,  if  this  is  not  possible  there  are  numerous  alternatives  to  find  a  solution 
of  the  problem.  For  instance,  the  Gauss-Newton  method  can  be  employed  if 
the  model  function  f(xk,{oij})  and  its  derivatives  with  respect  to  the  parameters 
otj  are  known  analytically.  Another  possibility  is  offered  by  the  application  of  an 
deterministic  optimization  algorithm  as  they  will  be  introduced  in  Appendix  I.  If 
even  this  method  is  not  applicable,  the  methods  of  stochastic  optimization,  discussed 
in  Chap.  20,  might  be  an  obvious  choice. 

We  describe  now  the  Gauss-Newton  method  which  is  essentially  a  general¬ 
ization  of  the  Newton  method  presented  in  Appendix  B.  The  Gauss-Newton 
method  is  a  method  developed  to  minimize  the  expression  (H.l)  iteratively.  The 
derivatives 


9/fa,  { otj }) 

dcn( 


(H.16) 


are  assumed  to  be  known  analytically.  This  is  an  iterative  algorithm  and,  thus,  an 
iteration  index  is  introduced  and  indicated  by  a  superscript  index  n  like  in  aj .  The 
algorithm  is  described  by  the  following  steps: 

1.  Choose  a  set  of  initial  values  {c^0}  for  the  iteration. 

2.  Linearize  the  function /(*/,  {otj})  and  insert  the  result  into  Eq.  (H.l): 


r 


E  Ck  |  yk  _/fa’  {«"}) 


-E 


9/fa,  {a,}) 
dae 


(at  -  af) 


J  {«/}={«"} 


(H.17) 
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We  introduce  the  following  abbreviations  for  a  more  compact  notation: 


d/M  = 


dfjxk ,  {g/}) 


-i  {«/}=(«;} 


(H.18) 


and 


fkn  =  f(xk,  {a"}) 


(H.19) 


3.  We  have  to  solve 


9r 

9of 


:  =  -2  £  ctd£,. 


-/*-  £  d%t 


-0, 


(H.20) 


for  all  parameters  {aj}.  Therefore,  we  introduce  vectors  a  —  (ot\,  ct2, . .  .)  ,  P  = 
(PuP2,  ■  ■  ■ )T  with 


p>  =  £  Ck(^k  ~fk)dfk,i  ’ 


(H.21) 


and  the  matrix  M  with  elements: 


Mij  -  £  ctd&d/Jh 


(H.22) 


This  transforms  Eq.  (H.20)  into  a  linear  system  of  equations 


M(a  -  an )  =  p  ,  (H.23) 

which  is  solved  for  =  a  —  an .  Please  note  that  an  denotes  the  vector  a  after 
n  iterations.  The  vector  an+l  for  the  next  iteration  step  is  guessed  from: 

a"+1  =  an  +  Aan  .  (H.24) 

4.  The  iteration  is  terminated  if  for  all  parameters  the  desired  accuracy  was 
achieved.  For  instance,  the  condition  | QfJ+1  —  aj\  <  e  can  be  used  with  €  a 
small  parameter.  A  criterion  for  the  relative  error  can  be  formulated  in  analogue. 

Some  comments  concerning  the  covariance  matrix  are  in  order:  It  is  more 
complicated  in  the  nonlinear  case  because  we  also  have  to  consider  the  second 
partial  derivatives  of  the  model  function/^,  {at}).  However,  if  these  can  for  some 
reason  be  neglected  we  obtain,  again,  that  Ny  —  My,  as  in  Appendix,  Sect.  H.2. 
Another,  more  serious  problem  is  found  in  the  fact  that  the  Gauss-Newton 
method  suffers  from  severe  instability  problems.  However,  a  possible  remedy  was 
formulated  by  D.  Marquart  [54]  who  suggested  to  multiply  the  diagonal  elements 
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of  the  matrix  M  with  a  factor  (1  +  A)  where  A  >  0.  A  detailed  analysis  shows  that 
one  can  choose  A  sufficiently  large  and  in  such  a  way  that  the  value  of  x2n  decreases 
monotonically,  i.e.  Xn+i  —  X-l  f°r  iterati°n  steps  n.  However,  an  increase  of  A 
decreases  the  convergence  rate  and  more  iterations  are  necessary  until  the  required 
accuracy  was  obtained.  It  is  therefore  desirable  to  choose  A  values  in  such  a  way 
that  the  error  decreases  monotonically  but  that,  at  the  same  time,  a  convergence 
rate  is  maintained  which  is  as  large  as  possible.  A  possible  strategy  is  to  start  with 
some  given  value  of  A  and  to  reduce  it  after  every  iteration  step  by  a  constant  rate. 
However,  if  at  some  point  the  error  /2  increases,  i.e.  /2+1  >  /2,  then  A  has  again  to 
be  increased. 


Appendix  I 

Deterministic  Optimization 


I.l  Introduction 

We  use  the  term  deterministic  optimization  to  distinguish  these  particular  optimiza¬ 
tion  methods  from  the  stochastic  optimization  methods  discussed  in  Chap.  20.  There 
are  numerous  different  deterministic  methods  designed  to  find  the  minimum  (or 
maximum)  of  a  given  function /(x),  where  x  can  be  a  vector.  Roughly  speaking, 
we  can  distinguish  between  methods  which  require  the  knowledge  of  the  Hessian, 1 
methods  which  need  gradients  only,  and  methods  which  are  based  on  function  values 
only.  For  instance,  if  the  gradient  of  a  function  is  known  analytically  one  may  exploit 
Newton’s  method,  as  it  was  introduced  in  Appendix  B.  Note  that  such  an  approach 
requires  the  Hessian  of  the  function/(x). 

We  plan  to  discuss  here  in  some  detail  two  specific  methods,  namely  the  method 
of  steepest  descent  and  the  method  of  conjugate  gradients.  Both  methods  require 
the  knowledge  of  the  gradient  of  the  function,  however,  the  gradient  can  also  be 
approximated  with  the  help  of  finite  differences  (see  Chap.  2).  A  discussion  of 
additional  methods  is  beyond  the  scope  of  this  book  and  the  interested  reader  is 
referred  to  the  available  literature  [55]. 

However,  before  discussing  these  two  methods  in  more  detail,  let  us  briefly 
consider  the  quadratic  problem  which  can  be  solved  analytically.  In  this  case  the 
function/(x)  can  be  written  as 


f{x)  —  -xtAx  —  bT x  +  c  , 
2' 


(i.i) 


!The  Hessian,  or  Hesse  matrix,  H  G  lAXjV  of  a  function /(x),  x  G  lA  is  the  Jacobian  of  the 
Jacobian  /(x)  of /(x)  defined  in  Eq.  (B.8).  Thus,  it  is  the  matrix  of  second  order  partial  derivatives 
of  a  function.  It  describes  the  local  curvature  of  a  function  of  many  variables. 
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where  v  e  M.N,  A  e  b  e  and  cg!  where  we  restrict  the  discussion  to  real 

valued  functions  for  reasons  of  simplicity.  We  demonstrate  now  that  for  symmetric 
and  positive  definite  matrices  A,  i.e.  AT  —  A  and  xrAx  >  0  for  all  v  ^  0,  the 
minimum  of  f(x)  is  given  by  v  =  A~lb.  The  gradient  of  f(x)  is  readily  evaluated 
and  is  given  by  : 


1  T  1 

Y f{x)  —  -Atx  H — Ax  —  b 
J  w  2  2 


This  immediately  yields  the  desired  result: 


(1.2) 


Ax  —  b  .  (1.3) 

It  follows  that  v  =  A~lb  is  a  minimum  because  we  assumed  A  to  be  positive 
definite.  It  is  possible  to  solve  the  optimization  problem  even  if  A  is  not  symmetric 
by  inverting  the  symmetrized  matrix  (A  +  Ar)  / 2.  Finally,  the  linear  equation  (1.3) 
can  be  solved  with  the  methods  discussed  in  Appendix  C. 


1.2  Steepest  Descent 

The  most  simple  gradient  based  method  is  the  method  of  steepest  descent  [6].  It 
is  based  on  the  rather  straight  forward  idea  of  moving  in  each  iteration  step  into 
the  opposite  direction  of  the  gradient,  i.e.  downhill.  Hence,  we  may  formulate  it 
mathematically  in  the  following  way:  Let  xn  be  the  current  position  of  our  search 
for  the  minimum.  Then  we  choose 


Xn+ 1  —Xn-  OtnSf(Xn)  ,  (1.4) 

where  the  step-size  in  direction  of  the  negative  gradient,  ctn,  has  to  be  determined  in 
an  additional  step.  The  step- size  should  be  chosen  in  such  a  way  that  we  reach  the 
line  minimum  in  direction  Vf(xn): 

—f[x„+ i(an)]  =  — v/(x„+ 1)  •  V/(x„)  =  0  .  (1.5) 

da„ 


2 We  remember  from  vector  analysis  that 

Vx  (xT Ax)  =  Vjc  (xT A)  x  +  VA.  (xT A1)  x  =  (A  +  A  f)x  . 
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Hence,  we  observe  that  for  an  optimal  choice  of  an  the  search  directions  are 
orthogonal.  In  practice  an  is  estimated  with  the  help  of  a  separate  minimization 
technique,  such  as  bisection.  This  technique  has  already  been  used  in  our  discussion 
of  the  shooting  methods  in  Chap.  10. 

We  provide  an  example  which  is  supposed  to  make  the  method  more  transparent 
and  to  help  in  the  discussion  of  its  caveats:  We  want  to  determine  the  global 
minimum  of  the  function 

f(x,y )  =  cos(2v)  +  sin(4y)  +  exp(1.5v2  +  0.7y2)  +  2x  .  (1.6) 

Its  gradient  is  easily  evaluated 

df(x,  y)  oo 

— - -  =  — 2sin(2v)  +  3vexp(1.5v2  +  0.7y2)  +  2  ,  (1.7) 

ox 


and 


df(x , y)  oo 

— - -  =  4cos(4y)  +  1.4y  exp(1.5v2  +  0.7y2)  .  (1.8) 

dy 

We  define  the  algorithm  steepest  descent  with  the  following  steps: 

1.  Choose  some  initial  values  vo  and  yo. 

2.  Calculate  the  gradient  V/(v„,  yn)  in  iteration  step  n. 

3.  Determine  an  in  such  a  way  that 

f[xn+i(ptn),yn+  \{otn)]  ->  min  ,  (1.9) 

which  is  equivalent  to 

giptn )  :=  V/[vw+i(af/7),yw+ i(oin)]  •  V/(vw,y„)  =  0  .  (1.10) 

This  is  achieved  by  a  bisection  technique  similar  to  the  one  employed  in 
Sect.  10.3, 

a.  Set  a®  —  0  and  chose  arbitrary. 

b.  Increase  abn  until  g(a“)g(cib)  <  0. 

c.  Define 


oian  +  a 


b 

n 


2 


9 


(in) 


and  determine  g(otcn). 
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Fig.  1.1  Iteration  sequence 
of  the  method  of  steepest 
descent  for  three  different 
starting  points 


d.  If  g«)g«)  <  0,  set  ab  —  oicn  and  return  to  step  c.  Otherwise,  set  otan  —  acn 
and  return  to  step  c. 

e.  The  bisection  is  terminated  if  \g{ot^)\  <  6,  with  e  some  required  accuracy  for 
the  bisection  part. 

4.  Check  whether  \f(xn+\,yn+\)  —f(xn,yn) \  <  rj  with  rj  some  required  accuracy. 

Return  to  step  2  for  the  next  iteration  step  if  the  algorithm  is  not  converged. 

The  above  algorithm  was  executed  for  the  function  f(x,y)  given  by  Eq.  (1.6)  for 
three  different  starting  points,  (0.8,  —0.75),  (0.8,  1.05),  and  (—1.05,  1.05).  The 
function /(v,  y)  as  well  as  the  iteration  sequence  towards  the  minimum  for  all  three 
starting  points  is  illustrated  in  Fig.  1.1. 

We  note  the  following  properties  of  the  method:  First  of  all  it  is  a  rather  slow 
method  due  to  the  orthogonality  of  subsequent  search  directions.  Moreover,  as  we 
observe  from  Fig.  1. 1,  we  can  only  find  the  local  minimum  closest  to  the  starting 
point  and  not  the  global  minimum  of  the  function  f(x,y).  The  convergence  rate  is 
also  highly  affected  by  the  choice  of  the  initial  position.  However,  it  is  a  very  simple 
method  which  works  in  spaces  of  arbitrary  dimension. 


1.3  Conjugate  Gradients 

The  method  of  conjugate  gradients  [6, 55]  is  based  on  the  definition  of  N  orthogonal 
search  directions  {i/q}  in  an  N  dimensional  space.  In  contrast  to  steepest  descent  it 
is  designed  in  such  a  way  that  we  take  only  one  step  in  each  search  direction  and  the 
minimum  is  found  after  at  most  N  steps,  if  the  function  f(x)  is  of  the  quadratic 
form  (1.1).  In  the  more  general  case,  however,  it  will  take  more  steps  but  will, 
nevertheless,  be  much  more  efficient  than  the  method  of  steepest  descent.  Fet  us 
formulate  the  method  for  a  general  function/(v). 
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We  approximate  the  function /(v),  with  v  e  M.N,  in  the  vicinity  of  the  reference 
point  xn  in  the  n- th  iteration  step  up  to  second  order  and  name  the  resulting  function 
/(*): 


1 

fn(x)  :=  f(xn)  +  V/(xn)  •  (x  -  xn)  +  -(x  -  xn)  •  [Af(xn)(x  -  xn)] 


1 


=  f(xn)  —  (x  —  xn)  +  -  (x  -  xn)TAn(x  -  xn)  . 


(1.12) 


Here,  An  denotes  the  Hessian  at  position  xn  and  bn  is  the  negative  gradient  at  xn. 
In  particular,  for  a  quadratic  function /(v)  the  equality /(v)  =  f(x)  holds.  We  now 
write  the  minimum  x  of f(x)  as  a  linear  combination  of  search  directions  {^-}  with 
coefficients  A,  and  the  initial  point  x^: 


M 

X  —  x0  +  lift  .  (1.13) 

i= 0 

Note  that  in  the  quadratic  case  (1.1)  this  sum  will  be  restricted  to  M  —  N  —  1.  At 
each  iteration  instance  we  have  the  relation 


Xn-\-\  —  Xn  H-  A 


n  •> 


(1.14) 


together  with  the  goal 


Xm  —  x  . 


(1.15) 


Let  us  define  now  a  couple  of  useful  quantities.  The  deviation  from  the  minimum  at 
iteration  step  n  +  1 ,  1 ,  is  given  by 


1  —  %n-\- 1  % 


—  Xn  H”  A n^n 

—  A ntyn  • 


We  define,  furthermore,  the  residual 


/V 

^ n+ 1  • —  V/nC^n+l) 

=  (-^n+ 1  -^n) 

=  bn  \nAn\lfn  > 


(1.16) 


(1.17) 


3  Note  that  the  Hessian  is  always  symmetric  for  real  valued  functions /(x)  due  to  the  symmetry  of 
second  order  derivatives. 
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where  we  employed  that 


1 

2 


V  [(*  -  xn)T An(x  -  x„)] 


(1.18) 


Finding  the  minimum  of  the  quadratic  approximation  f(x)  of  f(x)  around  xn  is 
equivalent  to  the  condition 


rn+ 1=0.  (1.19) 

In  particular,  we  have  to  find  the  product  Xn\j/n  in  such  a  way  that  rw+i  =  0.  Of 
course,  we  could  invert  the  Hessian  An  in  order  to  obtain  this  result.  However, 
this  would  be  too  expensive  from  a  computational  point  of  view.  The  idea  is  to 

/V 

apply  the  ideal  search  strategy  for  quadratic  functions  to  fn  (v)  in  order  to  obtain 
xn+\.  Hence,  the  method  of  conjugate  gradients  executes  packages  of  N  steps, 
where  each  package  solves  the  quadratic  problem  around  xn,  until  the  minimum 
of  the  original  function  f(x)  has  been  found.  Therefore,  we  have  to  generalize  the 
relations  (1.14),  (1.16),  and  (1.17)  for  iterations  within  step  n. 

We  have,  in  particular,  for  every  iteration  step  n 


N- 1 

-*72+  1  —  -*72  +  ^  '  ^12^22  > 

1=0 


together  with  the  definitions 


l+ 1  _ 


n 


=  K  +  Kt 


A 

n 


where  Furthermore,  we  define  the  deviation 


-  r  —  sd  _i_ 

—  Xn  -Xn+1  ~  0n  +  Any/n 


and  the  residual 


(1.20) 


(1.21) 


(1.22) 


(1.23) 


In  contrast  to  relation  (1.17),  Eq.  (1.23)  features  the  difference  ( xln+ 1  —  xn)  rather  than 
(xn+\  —  xn).  We  insert  the  recurrence  (1.21)  and  obtain 


—  bn  An(xn  Xn)  XnAn\l/n 

=  ri  -  XnA„i/ln . 


(1.24) 


I  Deterministic  Optimization 


393 


Hence,  in  contrast  to  relation  (1.17)  Eq.  (1.24)  defines  a  recurrence  relation.  Again, 
we  want  to  choose  the  search  directions  and  the  step  length  Xln  in  such  a  way  that 
we  find  the  minimum  as  quickly  as  possible.  Suppose  we  already  knew  the  search 
direction  xj/^ .  The  line  minimum  in  this  direction  is  then  given  by 

^/«(*«+1)  =  v/(4+1)  •  irln 

=  -Cl-Vn 

= -(/n  -  X%^)T 

=  -iriYVn  +  A  MYkVu 

=  0  ,  (1.25) 


and  we  have 


A 


i 

n 


(tY)TAnft  ' 


(1.26) 


Hence,  the  remaining  unknown  quantities  in  our  algorithm  are  the  search  directions 
\j/„.  So  far,  the  only  information  we  obtained  is  that  the  search  direction  xfr^  is 
orthogonal  to  the  residual  r^+1,  see  Eq.  (1.25). 

However,  we  also  know  that 


0  —  An  (pCn-\~  1  ^n) 


N—l 

—  An  y  '  A n^n  ~  bn  » 
1=0 


(E27) 


and  therefore 


/v-i 


o  -  (ifn)  An  -  (ft)  bn  , 


1=0 


(1.28) 


for  arbitrary  k.  A  sufficient  condition  to  ensure  the  validity  of  relation  (1.28)  is  to 
impose  An- orthogonality: 

[tn  |  ft)An  =  (fnYA-nfn  =  Sk,e{fn  \  tn)An  •  (L29> 

We  note  that  {xj/^  |  tyn)A  constitutes  indeed  a  scalar  product  since  An  is  positive 
definite  in  the  neighborhood  of  a  minimum. 
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Let  us  briefly  demonstrate  that  the  choice  (1.29)  fulfills  Eq.  (1.28).  First  of  all  we 
note  that  we  obtain  from  Eq.  (1.24) 


I 


k=0 


(1.30) 


and,  therefore,  we  derive  the  coefficients  Xln  from  Eq.  (1.26)  in  the  convenient  form: 


A 


(1.31) 


The  condition  of  orthogonality  (1.29)  is  used  to  rewrite  Eq.  (1.28)  as 


o  =  WI4-W)^-.  (L32> 

which  together  with  Eq.  (1.3 1)  proves  the  equality  (1.28).  Hence,  the  strategy  is  clear: 
We  choose  an  initial  direction  xj/^  and  then  construct  the  further  directions  in  such 
a  way  that  they  fulfill  An -orthogonality  (1.29).  Before  discussing  the  construction  of 
search  directions  in  more  detail  we  observe  the  following  property: 


i- 1 

WnYri  =  (fnfbn  ~  £  K{^n  |  VC)A„ 

m= 0 


(v4 )Tb„  for  k  >  l  , 
0  else. 


(1.33) 


This  means  that  all  search  directions  xj/^  for  k  <  t  —  1  are  orthogonal  to  the  residual 
r^,  or  in  other  words,  all  residuals  rln  are  orthogonal  (in  the  classical  sense)  to  all 
previous  search  directions. 

We  shall  now  briefly  outline  the  resulting  update  algorithm  for  search  directions: 
Let  {(p[)  be  a  set  of  linear  independent  vectors  that  span  our  search  space  for fn(x): 
We  write  the  search  direction  t/^  as 


together  with 


k- 1 


fn  =  Vi  + 

£=0 


tn  =  <Pn  ■ 


(134) 


(1.35) 


4In  principle  these  linear  independent  vectors  {(p„}  do  not  need  to  depend  on  the  index  n,  i.e.  on 
the  actual  position^.  However,  we  consider  here  the  most  general  case  as  will  soon  become  clear. 
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The  expansion  coefficients  jiff  can  be  determined  recursively  by  imposing  An- 
orthogonality  for  all  l  <  k: 


and,  therefore: 


0  =  {tn  |  tnl 


k~\ 


=  (<Pn  |  Vn)An  +  £  \  tl\ 


m= 0 


=  H  |  tn)A,  +  Pn{Vn  \  tn)A.  , 


(1.36) 


d-37) 


This  procedure  is  known  as  the  Gram-Schmidt  conjugation  [6,  12]. 

Now,  the  question  arises  how  one  should  choose  the  basis  vectors  <pfq  and  whether 
or  not  it  is  advantageous  to  choose  the  <p„  as  a  function  of  n.  A  particularly  clever 
choice  is  to  take  the  residuals,  i.e. 


p  p 

w  =  r 

rn  n 


(1.38) 


In  this  case  we  have  for  i  <  k 


tnl,, 

{tn 

tn)An 

(rkn)TAnfln 
{tn  |  tZ)A„ 

(g,f 

1 

1 

+ 

{tn 

tn)An 

X1 

L  ^ n  J 

(1.39) 


where  we  used  recurrence  (1.24).  We  now  calculate  with  the  help  of  Eq.  (1.34) 


i- 1 

(#  =  (fffi  -  (/ny  £  Ptrtn  =  0  - 

m=0 


(1.40) 
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for  t  <  k  due  to  the  orthogonality  of  the  search  direction  and  the  residuals,  see 
Eq.  (1.33).  Hence,  we  obtain  for  all  l  <k 


1  (r*)Tr*Sl+i* 


(L41) 


Hence,  the  name  conjugated  gradients. 

We  are  now  in  a  position  to  describe  the  algorithm  for  the  method  of  conjugated 
gradients : 

1.  Choose  an  initial  position  xo. 

2.  Determine  the  vector  b„  and  the  matrix  An  for  a  given  position  xn. 

3.  Perform  the  following  N  steps  in  order  to  calculate  jcw+i  : 


a.  Set 


(1.42) 


W  I  t n)An  ’ 


as  well  as 


%n-\- 1  —  4  * 


(1.43) 


b.  Calculate  for  A:  =  1 , . . . ,  N  —  1  the  residuals, 


/n  =  -  Xkn-xAn^ 


(1.44) 


the  new  search  directions 


(1.45) 


the  step  lengths 


(1.46) 


and,  finally,  the  modified  positions 


%n+ 1  —  %n  4"  • 


(1.47) 
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4.  If  | /(*;„+ 1)  —  /(jc„)|  <  6,  with  €  some  required  accuracy,  terminate  the  iteration, 
otherwise  return  to  step  2.  In  case  of  a  convex  function /(v)  terminate  also  after 
N  steps. 

Strictly  speaking,  this  algorithm  is  only  valid  for  convex  functions  because  we 
note  that  one  might  get  into  trouble  whenever  a  position  is  reached  at  which  the 
Hessian  is  not  positive  definite.  It  is  therefore  desirable  to  exclude  the  Hessian  from 
the  algorithm.  This  can  be  achieved  by  an  algorithm  developed  by  Fletcher  and 
Reeves  [56].  Based  on  our  previous  discussion  the  generalization  is  rather  obvious: 
If  we  do  not  want  to  use  the  Hessian  explicitly,  we  have  to  determine  the  step  length 
Xln  by  minimizing  f(pcln  +  A^^)  for  a  given  search  direction  x/r^  numerically.  The 
residuals  are  then  taken  to  be  the  exact  gradient  of  the  function  f(x^)  rather  than  of 
fn(xln).  The  next  search  direction  is  then  determined  via 


-V/(^+1)  + 


l|V/(^+1)ll2 

l|V/(**)||2  Yn 


(1.48) 


Hence,  we  have  the  following  algorithm  (Fletcher-Reeves  algorithm ): 

1.  Choose  an  initial  position  xq. 

2.  Perform  the  following  N  steps  in  order  to  calculate  xn+\\ 
a.  Set 


Vn  =  -y f(x„)  . 


(1.49) 


b.  Calculate  for  k  —  0, ...  ,N  —  \  \kn  by  minimizing/^  +  A*^*),  the  new 
position  xk+{  =  x^  +  A k\/fk,  and  the  new  search  direction  via 


=  -v/(4+1)  + 


l|V/(4+1)l|2iq 

l|V/(x*)|p  Wn 


(1.50) 


3.  If  \f{xn+\)  —f(xn)  |  <  €,  with  6  some  required  accuracy,  terminate  the  iteration, 

otherwise  return  to  step  2. 

The  resulting  sequence  of  steps  towards  the  minimum  for  the  same  function  and 
initial  conditions  as  were  used  for  Fig.I.l  is  illustrated  in  Fig.  1.2.  In  comparing 
Figs.  1. 1  and  1.2  we  note  immediately  that  the  search  strategy  developed  for  the 
method  of  conjugate  gradients  superbly  outperforms  the  search  strategy  of  the 
method  of  steepest  descent.  In  particular,  if  the  ratio  between  the  gradient  in  v  and 
y  direction  is  large,  a  strategy  of  orthogonal  search  directions  is  disadvantageous. 
This  particular  case  is  illustrated  in  Fig.  1.3  for  both,  steepest  descent  and  conjugate 
gradients.  Here  we  investigate  the  convex  function 


fix,  y)  -  X2  +  1 0/  , 


(1-51) 
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Fig.  1.2  Iteration  sequence  of  the  method  of  conjugated  gradients  for  three  different  starting  points 


Fig.  1.3  Comparison  of  the  iteration  sequence  between  the  method  of  steepest  descent  and  the 
method  of  conjugated  gradients 


together  with  an  initial  position  (vo,yo)  =  (1.9, 0.4).  The  resulting  sequence 
of  points  towards  the  minimum  is  illustrated  in  Fig.  1.3.  In  the  case  of  steepest 
descent  the  sequence  approaches  the  minimum  rather  slowly  since  subsequent 
search  directions  have  to  be  orthogonal  to  each  other  in  the  classical  sense. 
The  advantage  of  conjugate  gradients  is  that  A„-orthonormality  accelerates  the 
convergence  towards  the  minimum.  In  this  example  we  reach  it  within  two  steps 
and  a  required  absolute  accuracy  of  r]  —  10-7. 

As  a  final  remark  we  note  that  also  the  method  of  conjugate  gradients  will 
only  find  the  local  minimum  closest  to  the  initial  position.  Hence,  the  outcome  of 
the  method  highly  depends  on  the  choice  of  xq,.  Moreover,  the  calculation  of  the 
gradients  may  be  very  tedious  and  time-consuming  from  a  numerical  point  of  view. 
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relaxation  parameter  355 
inhomogeneous  121 
Jacobi  method  160,  354 
LU  decomposition  350 
non-homogeneous  349 
sparse  matrix  353 

tridiagonal  matrix  124,  133, 135,  173, 
174,  352 
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Linear  multi-step  methods  66,  67 
Loop  algorithm  308 
LU  decomposition  350 


M 


Machine-number  7 
Magnetic  susceptibility  229,  230,  302 
Magnetization  229,  236,  302 
Marginalization  rule  203,  260,  369 
Markov  process  251 

Chapman-Kolmogorov  equation 
252,  255 

detailed  balance  257,  258 
equilibrium  distribution  function  257 
global  balance  257 
Hamilton’s  equations  of  motion  257 
hierarchy  of  pdfs  252 
Markov  property  252 
master  equation  255,  256 
Poisson  process  254 

transition  probability  254 
waiting  time  254 
precursor  state  25 1 
time-homogeneous  255 
transition  probability  252 
transition  rate  255 
Wiener  process  253 

transition  probability  253 
MARKOV-chain  221,259 
absorbing  state  261 
aperiodic  state  261,  274 
Chapman-Kolmogorov  equation 
260, 267 

closed  set  of  states  260 
continuous  state  space  266 
continuous  time  266 
detailed  balance  221,  263,  297,  299 
equilibrium  distribution  262,  263 
ergodic  state  261 
irreducible  260, 262 
irreducible  class  260 
Markov  property  259 
null  recurrent  state  261 
periodic  state  26 1,274 
positive  recurrent  state  261 
recurrent  state  261 
reversible  263 

stationary  distribution  262,  297 
transient  state  261 
transition  matrix  259 


transition  probability  299 
MARKOV-chain  Monte-Carlo  sampling 
Gibbs  sampling  301 
Metropolis  algorithm  219,  237, 297 
acceptance  probability  221,  237,  300 
asymmetric  proposal  probability  300 
correlations  222 
initialization  222 
thermalization  222 

Metropolis-Hastings  algorithm  298, 
300,  328 

slice  sampling  301 

Maxwell  -  Boltzmann  distribution  112 
Mean- value  integration  See  Quadrature 
Methodological  error  See  Error 
Metropolis-Hastings  algorithm  See 
MARKOV-chain  Monte-Carlo 
sampling 

Metropolis  algorithm  See  MARKOV- 
chain  Monte-Carlo  sampling 
Midpoint  rules  66,  68,  69 
Molecular  dynamics  103 
barometric  formula  115 
boundary  conditions  109 
constant  temperature  111 
external  potential  104 
finite  volume  effects  1 10 
forces  105 
initial  conditions  112 
leap-frog  method  see  Integrator 
Lennard-Jones  potential  104 
natural  units  112 
Newton  equations  of  motion  104, 

105 

Stormer  -  Verlet  method  106 
system  temperature  111 
thermal  equilibrium  112 
time-reversal  symmetry  108 
total  kinetic  energy  111 
total  velocity  shift  110 
two-particle  interaction  104 
velocity  Verlet  algorithm  108 
Monte-Carlo  integration  See  Quadrature 


N 

Neel  temperature  226 
Newton  method  78,  347 
Newton-Cotes  rules  38 
Normal  distribution  See  Probability  density 
function 

Normalization  condition  126 
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O 


Open  integration  rule  34 
Operator 

expectation  value  141 
Hamilton  operator  139, 171, 360 
Hermitian  141 
kinetic  energy  140 
Laplace  operator  131 
momentum  140 
position  147 
potential  energy  140 
time-evolution  operator  171,361 
Ordinary  differential  equation  53,  57 
collocation  point  72 
eigenvalue  problem  125,  126,  140 
explicit  63,  64 
homogeneous  118 
homogeneous  boundary  value  problem 
126, 148 

initial  value  problem  63 
integrators  see  Integrator 
linear  boundary  value  problem  117 
Ornstein-Uhlenbeck  process  284 
master  equation  284 

P 


Paramagnetism  226 
Partial  differential  equation 

diffusion  equation  see  Diffusion  equation 
elliptic  157,  158 
hyperbolic  157,  167 
Laplace  equation 
charge  density  158 
parabolic  157,  163 
Poisson  equation  158 
charge  density  158,  161 
convergence  condition  162 
electric  held  158 
electrostatic  potential  158 
iterative  solution  160 
split  operator  technique  171,359,361 
time-dependent  heat  equation  163 
Crank-Nicolson  method  164 
explicit  Euler  method  164,  165 
implicit  Euler  method  164,  165 
stability  164 

time-dependent  Schrodinger  equation 
see  Schrodinger  equation 
wave  equation 

explicit  Euler  method  167 


one-dimensional  167 
Pauli  matrix  227 

Pdf  See  Probability  density  function.  See 
Stochastic  process 
Pendulum  2 
period  3 
Phase  space  94 
Phase  transition 

critical  exponent  377 
Curie-Weiss  law  228,  377 
Curie  temperature  225,  226,  376 
first  order  376 
modem  classification  376 
Neel  temperature  226 
second  order  225,  376 
Landau  theory  376 
order  parameter  225,  376 
universality  378 
Poincare  map  96 
Poincare  section  96 

POISSON  distribution  See  Probability  density 
function 

POISSON  equation  See  Partial  differential 
equation 

Poisson  process  254 
Poor  person’s  assumption  312 
Potts  model  302 

Hamilton  function  302 
heat  capacity  302 
magnetic  susceptibility  302 
magnetization  302 
phase  transition 
hrst  order  305 
histogram  technique  305 
second  order  305 

Predictor-corrector  method  See  Integrator 
Probability 

classical  363 
conservation  200 
correlation  coefficient  371 
event  363 
certain  363 
complimentary  364 
impossible  363 

Probability  density  function  112,  186 
binomial  distribution  366 
Boltzmann  distribution  215,  220, 229, 
236,  307,319,  327,331 
Cauchy  distribution  331 
X2  distribution  193 
composite  pdf  206 
exponential  distribution  208 
GAUSS  distribution  370 
Levy  a-stable  distributions  374 
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normal  distribution  198,  204,  218,  254, 
285,  370 

piecewise  defined  206 
POISSON  distribution  255,  367 
stable  distribution  373 
Ts ALLIS  distribution  332 


Q 

Quadrature  3 1 

backward  rectangular  rule  33,  57,  65 
central  rectangular  rule  34,  58,  65,  68, 
107,218 

closed  integration  rule  34,  35 
elemental  area  33,  35 
forward  rectangular  rule  33,  56,  65,  148, 
358 

Gauss-Hermite  45 
Gauss-Legendre  41 
error  45 
grid-point  43, 45 
weight  43, 45 
improper  integrals  48 
integral  transform  48 
Monte-Carlo  integration  211,  218,  220, 
236 

approximation  of  tc  211 
error  218,  219 
expectation  value  214 
hit  and  miss  213 
mean- value  214 

mean- value  integration  214 
multiple  integrals  49 
Newton-Cotes  rules  38 
closed  38 
open  39 

open  integration  rule  34 
rectangular  rule  32,  34,  39 
error  33,  34 
Romberg  method  39 
Simpson  rule  37, 39 
error  40 

three-eight  rule  38,  39 
total  error  40 
trapezoidal  rule  35,  39,  66 
error  36, 39 
total  error  40 


R 


Random  number  184 

non-uniform  distribution  186,  197 
pseudo  185 
real  185 
sequence 

correlation  185,  190 
moments  185,  190 
moments  error  190 
uniform  distribution  185,298 
Random  number  generator 
criteria  1 86 
Fibonacci  188 
lagged  188 

linear  congruential  187,  212 

Park-Miller  parameters  187 
shuffling  187 
Marsaglia-Zaman  188 
carry  bit  1 89 
period  186 
quality 

X2  test  191 
hypothesis  test  191 
spectral  test  191 
statistical  tests  190 
seed  187 
shift  register  188 

Random  sampling  See  also  MARKOV-chain 
Monte-Carlo  sampling 
direct  method  197 
importance  sampling  219,  297-299 
inverse  transformation  200,  220 
n- sphere  326 
probability  mixing  206 
rejection  method  202,  219,  220 
acceptance  probability  202 
envelope  202 
histogram  test  205 
simple  sampling  298 
Random  variable  183,  248,  364 
central  moments  365 
characteristic  function  371 
mean  value  364 
moments  364 
standard  deviation  365 
uncorrelated  366 
variance  365 
Random  walk  273 
biased  274 
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definition  273 
moments  275 

probability  of  first  return  278 
recurrence  277 
recurrence  probability  278 
transition  rate  273 
unbiased  274 
variance  276 
Randomness  183 
definition 

Chaitin  184 
event  184 
measurement  184 
probability  184 

Rectangular  rules  33,  34,  56-58,  65,  68,  107, 
148,218 

Reflection  principle  277 
Regula  falsi  348 

Rejection  method  202,  205,  219,  220 
Relative  error  See  Error 
Riemann-Liouville  fractional  derivative 
292,  379 

Riemann-Liouville  fractional  integral 
379 

Riesz  fractional  derivative  292,  380 
Riesz  fractional  integral  380 
Romberg  method  39 
Roundoff  error  See  Error 
Runge-Kutta  methods  66,  68,  70,  72,  76 

S 


Schrodinger  equation 
basis  142 

dimensionless  variables  144 
eigenenergy  140 
eigenfunction  140 
Gauss  wave  packet  176 
stationary 

one-dimensional  127, 143 
time-dependent  170, 360 

Crank-Nicolson  method  172 
explicit  Euler  method  171 
time-evolution  operator  171,361 
total  wave-function  142 
wave-function  139 
normalization  143 
Series  expansion 

Lagrange  polynomial  38, 73 
Legendre  polynomials  43 
Taylor  19, 23, 32,  34, 66, 69, 106-108, 
168,255,318 


Shooting  method  124 

Numerov  method  127,  147, 149 
Simpson  rule  37,  39 
Simulated  annealing  See  Stochastic 
optimization 

Slice  sampling  See  MARKOV-chain 
Monte-Carlo  sampling 
Split  operator  technique  359,361 
Stability  5,  9, 157 

Courant-Friedrichs-Lewy  condition 
158, 168 
definition  9 
Standard  deviation  365 
Standard  error  See  Error 
Statistical  bootstrap  242,  314 
Steepest  descent  See  Deterministic 
optimization 

Stirling  approximation  1 1,  278,  367 
Stochastic  differential  equation  183,  284 
random  force  284 
Stochastic  matrix  259 
Stochastic  optimization 

ant  colony  optimization  337 
cost  function  323 
deluge  algorithms  336 
genetic  algorithm  334 

traveling  salesperson  problem  335 
grouping  genetic  algorithms  337 
hill  climbing  325 

A-queens  problem  326 
simulated  annealing  327 
Aarts  schedule  331 
acceptance  probability  329 
fast  331 
generalized  332 
geometric  cooling  schedule  330 
initial  temperature  329 
traveling  salesperson  problem  332 
threshold  algorithms  336 
Stochastic  process 

auto-correlation  function  249 
auto-covariance  function  249 
conditional  pdf  250 
definition  248 
Gaussian  process  251 
hierarchy  of  pdfs  249 
independent  increments  25 1 
Levy  process  251 
moments  249 
pdf  249 

random  variable  248 
realization  248 
random  walk  25 1 

state  space  248 
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stationary  increments  250 
stationary  process  250 
time  span  248 

time -homogeneous  process  250 
transition  probability  250 
Wiener  process  25 1 
Stochastic  variable  See  Random  variable 
Stormer-Verlet  method  66, 106 
Subtractive  cancellation  7,  19 
Swendsen-Wang  algorithm  307 
Symplectic  integrators  59,  73,  75-77 
Symplectic  mapping  74 

T 

Taylor  theorem  See  Series  expansion 
Thermodynamic  equilibrium  221 
Thermodynamic  expectation  value  219 
Time  series  plot  312 
Time-dependent  heat  equation  See  Partial 
differential  equation 
Trapezoidal  rule  35,  39,  66 
Traveling  salesperson  problem  332,  335 
Truncation  error  See  Error 
Two-body  problem  See  Kepler  problem 


V 

Variance  147,218,365 
velocity  Verlet  algorithm  108 
Violation  of  energy  conservation  75,  79 

W 


Wave  equation  See  Partial  differential 
equation 

Wave-function  See  SCHRODINGER  equation 
Weyl  fractional  derivative  380 
Weyl  fractional  integral  380 
White  noise  281 
Gaussian  282 

Wiener  process  251, 253, 279 
continuous  limit  280 
drift  term  281 
independent  increments  280 
self- similarity  281 
standard  process  281 
Wolff  algorithm  308 


